Page tree

 

Support | BlogContact Us | 844.332.2821

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

When a dataset is first loaded into the Transformer, the default sampling collects the first N rows of data, depending on the size and density of each row. However, your dataset might contain variations in the data that are not present in this first sample. 

Tip

Tip: When you first load a dataset in the Transformer page, a new random sample is generated for you from the entire dataset. When the sample is ready, you can select it from the Sampling menu at the top of the Transformer page. The sample is loaded into the Transformer page, and the data quality bars and histograms are updated based on this information. You can generate a new random sample whenever you make a change to the number of rows in your dataset. For more information, see Sampling Menu.

Validate Consistency

The 

D s webapp
 provides useful features for checking that your data is consistent across its rows. With a few recipe steps, you can create custom validation checks to verify values.

...

The above checks the values in the Primary_Website_or_URL column against the Url data type. If the value in the source column is not a valid URL, then the new column value is true.After sorting After sorting by this new column, all of the invalid URLs are displayed next to each other in the data grid, where you can review them in detail. 

...

D code

derive value:((Rating < 10) || (Rating > 90)) as:'outlier_Rating'

After the above transform is added, you can sort the generated column to group together your customer outliers. 

Duplicate rows

Entire rows can be tested for duplication. The deduplicate transform allows you to remove identical rows. Note that whitespace and case differences are evaluated as different rows. For more information, see Deduplicate Data.

...