The process of cleansing, enhancing, and transforming your data can introduce significant changes to it, some of which might not be intended. This page provides some tips and techniques for validating your dataset, from start to finish for your data wrangling efforts.
Data validation can be broken down into the following categories:
Visual profiling also generates statistics on the values in each column in the dataset. You can use this statistical information to assess overall data quality of the source data. This visual profile information is part of the record for the job, which remains in the system after execution.For more information, see Profile Your Source Data.
Generate a new random sample
- Counts of valid, unique, mismatched, and missing values.
- Breakdowns by quartile and information on maximum, minimum, and mean values.
Available statistics depend on the data type for the column. For more information, see Locate Outliers.
Data range checks
Standard deviation ranges
You can perform ad-hoc tests for uniqueness of individual values. For more information, see Deduplicate Data.
Data quality rule:
The following data quality rule verifies that all of the values in the
custId column are unique:
Click the gray bar to prompt for a set of suggestion cards for handling those values. For more information, see Find Missing Data.
While null values are categorized with missing values, they are not the same thing. In some cases, it might be important to distinguish the actual null values within your dataset, and several
|D s lang|
Validate data against other data
- Some problems in the data might have been generated in the source system. If you plan to use additional sources from this system, you should try to get these issues corrected in the source and, if necessary, have your source data regenerated.
- Some data quality issues can be ignored. For the sake of downstream consumers of the data, you might want to annotate your dataset with information about possible issues. Be sure to inform consumers on how to identify this information.
|D s also|