During the Auto Model stage, Machine Learning automatically trains your models and ranks them in a leaderboard, then you select what model you want to use. You can also see which features Machine Learning has automatically created as part of the modeling and scoring process and modify your model settings.
Machine Learning automatically chooses a performant model for you, but you can manually change what model you want to use from the dropdown. To view a list of available models for each machine learning method, go to Machine Learning Models.
The Leaderboard panel shows you information about the model you've selected, what models are available for you to select, and the ranking metric we've used to rank models in the leaderboard.
For any model you select, we show its score based on the ranking metric, how many features the model uses, how much better it has performed than a baseline model, and some highlights of the machine learning pipeline we've used to build the model.
The score differs based on the ranking metric you select. To learn more about specific ranking metrics, select the book icon in the upper-right corner.
The baseline model simulates a random guess.
In the Feature Engineering panel, use primitives to create new, engineered features. Engineered features help to better represent the underlying problem in your model. You can also review helpful information about the correlations of your new features.
Features Versus Columns
In Machine Learning, we refer to the columns of your dataset as features. Features are measurable values or characteristics of your data.
Check the box next to each primitive you want to include in Feature Engineering. Then select Save Changes to automatically calculate new features. Note that a selected primitive applies to all the columns that have a matching data type.
Don't initially turn on all primitives. A large number of features might make it difficult for the model to find patterns in your data. We recommend that you read through each primitive description. Give some thought to the primitives that might be useful for your data.
The Features tab shows the features that we calculated from your selected primitives. We mark engineered features with an asterisk (*). The Applied Primitive column displays the primitive we used to create each feature.
Select Show Origin Features to include the original features of your dataset in the list.
The correlation matrix shows the strength of correlations between your original and engineered features. Select a cell in the correlation matrix to show the relationship between its pair of features.
Tune the automated machine learning process. Make sure to select Save Changes if you make any changes.
Select the objective function you want to use to determine the ranking of models you've trained. Objective functions are measures of how optimal a model is for your use case. To learn more about specific metrics, select the book icon in the header or Learn More while on the Leaderboard panel during the Auto Model stage.
At a minimum, Auto Model search runs all allowed estimator families for the given problem type. Use this option to increase the search time of additional pipelines. This might improve modeling results.
Choose how many folds to use during cross-validation.
Enter the percentage of your original data that you want to use as holdout data.
Set up your time series model.
Important
The Time Series Setup panel is only available if you select Time Series Regression under Machine Learning Method.
The Primary DateTime Column indicates your time index. To change this column, return to the Problem Setup stage.
Adjust the time intervals for your model. Note that to change the Forecast Horizon, return to the Problem Setup stage.
Training Window Size: Select how far back in time your model should reference to make predictions.
Data Access Gap: Select the amount of time between the end of your training window and the start of your prediction window.
Decomposition splits your Original Data into its Decomposed Components. These components reflect the Trend, Seasonal, and Residual characteristics of your Observed data.