Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0811

...

Visual profiling also generates statistics on the values in each column in the dataset. You can use this statistical information to assess overall data quality of the source data. This visual profile information is part of the record for the job, which remains in the system after execution.

For more information, see Profile Your Source Data.

Generate a new random sample

...

  • Counts of valid, unique, mismatched, and missing values.
  • Breakdowns by quartile and information on maximum, minimum, and mean values.

For more information, see Column Details Panel.

Available statistics depend on the data type for the column. For more information, see Locate Outliers. 

Data range checks

Standard deviation ranges

...

You can perform ad-hoc tests for uniqueness of individual values. For more information, see Deduplicate Data. 

Data quality rule:

The following data quality rule verifies that all of the values in the custId column are unique:

...

Click the gray bar to prompt for a set of suggestion cards for handling those values. 

For more information, see Find Missing Data.

Null values

While null values are categorized with missing values, they are not the same thing. In some cases, it might be important to distinguish the actual null values within your dataset, and several

D s lang
 can assist in finding them. See Manage Null Values. 

Validate data against other data

...

  • Some problems in the data might have been generated in the source system. If you plan to use additional sources from this system, you should try to get these issues corrected in the source and, if necessary, have your source data regenerated. 
  • Some data quality issues can be ignored. For the sake of downstream consumers of the data, you might want to annotate your dataset with information about possible issues. Be sure to inform consumers on how to identify this information.

D s also
inCQLtrue
label(label = "validation_tasks")