During the Prep Data stage, you upload your data, view your dataset, and assess the health of your data.
To upload data, you can Browse files from My Files or Import files from your computer.
We treat rows that start with a number sign (#) as comments. If the first row in your CSV file is a comment, then the second row becomes the column header.
We also remove empty rows after the header row in your CSV file. If rows contain some empty cells, we convert these empty cells to null values instead. Note that it's good practice to upload files without empty rows.
The Dataset panel shows you any warnings or errors that occur while running automatic data checks. Keep in mind that some errors won’t allow you to continue until you resolve them.
To learn more about the data checks we run, select See Details:
- On the Jobs tab, you can see what jobs have run, their statuses, how long the jobs took to run, and their progress.
- On the Data Checks tab, you can see what data checks have run, their statuses, any details, and recommended actions.
Select the X icon or Hide Details to exit the data-checks window.
Each column has a dropdown you can use to set the data type of the column.
To view a summary of the data in each column, turn on Data Profiling.
The Data Health panel helps you assess the quality of your data. We provide a score for the data’s health based on a few factors for each column: the fraction of missing values, the number of outliers, target leakage, and class imbalance (for classification problems). For a detailed breakdown of how we calculate these scores, refer to the Education Mode.
When you are ready to go to the next stage, Data Insights, select Next.
This page has no comments.