Before you begin performing analytics on a dataset, it is important to identify and recognize outlier data patterns and values.
Unusual values or patterns in the data can be sources for the following:
- Missing data.
- Bad data. See Find Bad Data.
- Poorly formatted data
- Mismeasured data
- Data that skews statistics
This section provides guidance in how to locate these patterns of data in individual columns.
For assessing anomalies in individual columns,
|D s product|
When your data contains a significant number of specific values, you should review them to see if the values have meaning. They may be placeholders for missing values. See Find Missing Data.
For numeric data, you should be skeptical of occurrences of the following values:
Column Detail Statistics
The Column Details panel provides information on the following:
Tip: Any green bar in the Column Details panel can be selected to prompt for suggestions on actions, including values in Outliers, Value Histogram, and Frequent Values graphs. Multi-select values as needed.
See Column Details Panel.
|D s product|
Let them be. If the data is valid, do not remove it unless you have an explicit reason for doing so.
Convert to more meaningful values. You can use the set transform to change outlier values to values that are valid for purposes of analysis.
NOTE: Please be aware that changing of values may impact the validity of your statistical analysis.
Example of overwriting values where values in the
col_numberscolumn that are below
25are set to the average value for the column. Otherwise, use the current value:
D trans Type step p01Name Columns p01Value col_numbers p02Name Formula p02Value IF((col_numbers < 25), AVERAGE(col_numbers), col_numbers) SearchTerm Edit column with formula
|D s also|