Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Excerpt

D s product
rtrue
 provides multiple mechanisms for reviewing your column values and identifying patterns in the data format or similar values which mean the same thing. This section summarizes the available methods of standardization, as well as their recommended uses.

Standardization Methods

Through simple visual tools, you can select the patterns or clustered value to standardize and, when prompted, the patterns or values to use as their standard. As needed, you can apply formatting or structuring functions to the data for finer grain controls.

...

Standardization Methods

You can use any of the following methods for standardizing values in your dataset's columns. Depending on the situation, you may choose to mix-and-match these methods. Details on these methods are below.

MethodDescriptionRecommended UsesHow to Use
By clustering

D s product
can identify similar values using one of the available algorithms for comparing values. You can compare values based on spelling or language-independent pronunciation.

  • Standardize values to correct spelling differences, capitalization, whitespace, and other errors.
  • Values must be consistent across rows of the column.
  • Primarily used for string-based data types.

Available through the Standardize Page

By pattern

D s product
can identify common patterns in a set of values and suggest transformations to standardize the values to a common format.

  • Standardize values to follow a consistent format, such as phone numbers or social security numbers.
  • Data type follows a somewhat consistent format and needs reshaping.
Available in the Patterns tab in Column Details Panel
By functionYou can apply one or more specific functions to cleanse your data of minor errors in formatting or structure.
  • Good method for improving the performance of pattern- or algorithm-based matching.
  • Some functions are specific to a data type, while others have more general application.
Edit column with formula in the Transform Builder.
Mix-and-matchYou can use combinations of the above methods for more complex use cases.
  • Combine function-based standardization for global changes to all values with cluster- or pattern-based standardization for individual value changes.
 

Invalid Values

These standardization techniques assume that your column contains only valid or empty values. 

...

 
CategoryFunctionDescription
String ConversionCHAR Function

D excerpt include
pageCHAR Function
nopaneltrue


UNICODE Function

D excerpt include
pageUNICODE Function
nopaneltrue
 

Case ConversionUPPER Function

D excerpt include
pageUPPER Function
nopaneltrue
 


 LOWER Function

D excerpt include
pageLOWER Function
nopaneltrue
 
 


PROPER Function

D excerpt include
pagePROPER Function
nopaneltrue
 

Cleanse FunctionsTRIM Function

D excerpt include
pageTRIM Function
nopaneltrue
 


TRIMQUOTES Function

D excerpt include
pageTRIMQUOTES Function
nopaneltrue

 


REMOVEWHITESPACE Function

D excerpt include
pageREMOVEWHITESPACE Function
nopaneltrue
 
 


REMOVESYMBOLS Function

D excerpt include
pageREMOVESYMBOLS Function
nopaneltrue
 

String Sizing FunctionsLEFT Function 

D excerpt include
pageLEFT Function
nopaneltrue
 
 


RIGHT Function

D excerpt include
pageRIGHT Function
nopaneltrue
 


 PAD Function 

D excerpt include
pagePAD Function
nopaneltrue
 

 String Comparison Functions 
See Compare Strings

Example:

...

  • The Columns value is a wildcard, which in this case applies the transformation across all columns in the dataset (*).
  • In the Formula, you see a nested expression. If the value in the column is valid against String data type, then, do the following to the column value:

    Info

    NOTE: The IFVALID function tests each row value for validation against the specified data type. It does not test the column against the data type. See IFVALID Function.

     


    • The TRIM function removes leading and trailing whitespace, which may register as a difference between values. See TRIM Function.
    • The UPPER function then converts the output of the TRIM function to all uppercase. So, differences in capitalization are eliminated. See UPPER Function.
    • The LEFT function truncates the output of the UPPER function to a maximum of 32 characters. See LEFT Function.

...


For more information including examples on the DATEFORMAT function, see Format Dates.

D s also
labelstandardization