Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r093

D toc

Excerpt

Before you begin performing analytics on a dataset, it is important to identify and recognize outlier data patterns and values. 

Unusual values or patterns in the data can be sources for the following:

This section provides guidance in how to locate these patterns of data in individual columns.

...

columns

...

.

Single-column outliers

For assessing anomalies in individual columns,

D s product
rtrue
 provides visual features and statistical information to quickly locate them.

...

When your data contains a significant number of specific values, you should review them to see if the values have meaning. They may be placeholders for missing values. See Find Missing Data. 

For numeric data, you should be skeptical of occurrences of the following values:

...

Column Detail Statistics

The Column Details panel provides information on the following:

...

Tip

Tip: Any green bar in the Column Details panel can be selected to prompt for suggestions on actions, including values in Outliers, Value Histogram, and Frequent Values graphs. Multi-select values as needed.

See Column Details Panel.

Outliers

D s product
 uses a special set of computations to identify values that it designates as outliers. 

...

  • Let them be. If the data is valid, do not remove it unless you have an explicit reason for doing so. 

  • Convert to more meaningful values. You can use the set transform to change outlier values to values that are valid for purposes of analysis.  

    Info

    NOTE: Please be aware that changing of values may impact the validity of your statistical analysis.

    Example of overwriting values where values in the col_numbers column that are below 25 are set to the average value for the column. Otherwise, use the current value:

    D trans
    Typestep
    p01NameColumns
    p01Valuecol_numbers
    p02NameFormula
    p02ValueIF((col_numbers < 25), AVERAGE(col_numbers), col_numbers)
    SearchTermEdit column with formula

 

D s also
inCQLtrue
label((label = "discovery_tasks") OR (label = "transformation_ui") OR (label = "outlier"))