Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r092

...

MethodDescriptionRecommended UsesHow to Use
By clustering

D s product
can identify similar values using one of the available algorithms for comparing values. You can compare values based on spelling or language-independent pronunciation.

  • Standardize values to correct spelling differences, capitalization, whitespace, and other errors.
  • Values must be consistent across rows of the column.
  • Primarily used for string-based data types.

Available through the Standardize Page

By pattern

D s product
can identify common patterns in a set of values and suggest transformations to standardize the values to a common format.

  • Standardize values to follow a consistent format, such as phone numbers or social security numbers.
  • Data type follows a somewhat consistent format and needs reshaping.
Available in the Patterns tab in Column Details Panel
By functionYou can apply one or more specific functions to cleanse your data of minor errors in formatting or structure.
  • Good method for improving the performance of pattern- or algorithm-based matching.
  • Some functions are specific to a data type, while others have more general application.
Edit column with formula in the Transform Builder.
Mix-and-matchYou can use combinations of the above methods for more complex use cases.
  • Combine function-based standardization for global changes to all values with cluster- or pattern-based standardization for individual value changes.
 

Invalid Values

These standardization techniques assume that your column contains only valid or empty values. 

...

CategoryFunctionDescription
String ConversionCHAR Function

D excerpt include
pageCHAR Function
nopaneltrue

 UNICODE Function

D excerpt include
pageUNICODE Function
nopaneltrue
 

Case ConversionUPPER Function

D excerpt include
pageUPPER Function
nopaneltrue
 

 LOWER Function

D excerpt include
pageLOWER Function
nopaneltrue
 

 PROPER Function

D excerpt include
pagePROPER Function
nopaneltrue
 

Cleanse FunctionsTRIM Function

D excerpt include
pageTRIM Function
nopaneltrue
 


TRIMQUOTES Function

D excerpt include
pageTRIMQUOTES Function
nopaneltrue

 REMOVEWHITESPACE Function

D excerpt include
pageREMOVEWHITESPACE Function
nopaneltrue
 

 REMOVESYMBOLS Function

D excerpt include
pageREMOVESYMBOLS Function
nopaneltrue
 

String Sizing FunctionsLEFT Function 

D excerpt include
pageLEFT Function
nopaneltrue
 

 RIGHT Function

D excerpt include
pageRIGHT Function
nopaneltrue
 

 PAD Function 

D excerpt include
pagePAD Function
nopaneltrue
 

 String Comparison Functions See Compare Strings

Example:

...

  • The Columns value is a wildcard, which in this case applies the transformation across all columns in the dataset (*).
  • In the Formula, you see a nested expression. If the value in the column is valid against String data type, then, do the following to the column value:

    Info

    NOTE: The IFVALID function tests each row value for validation against the specified data type. It does not test the column against the data type. See IFVALID Function.

     

    • The TRIM function removes leading and trailing whitespace, which may register as a difference between values. See TRIM Function.
    • The UPPER function then converts the output of the TRIM function to all uppercase. So, differences in capitalization are eliminated. See UPPER Function.
    • The LEFT function truncates the output of the UPPER function to a maximum of 32 characters. See LEFT Function.

...