Skip to main content

Outlier Plots

Find outliers in your dataset that might affect your modeling results.

Outliers are values in your dataset that don’t fall within the expected distribution of the data. They generally occur from measurement or data collection errors. Outliers can disproportionately affect the performance of your model. This is due to machine learning models requiring specific distributions of the input data to work properly. We recommend that you remove outliers unless you think those outliers are meaningful for the model you want to build.

Outlier Plots

Use the box plots in the Outliers panel to find outliers that fall outside the expected distribution of data. These outliers might cause problems when you build your machine learning model. Look at the flagged values in red for each variable to determine if you should include them in the dataset. Also, examine the histogram to see if the distribution is skewed. If you find invalid data, select the rows from the Outliers Instances table and then select Delete Outliers.