When a dataset is first loaded into the Transformer, the default sampling collects the first N rows of data, depending on the size and density of each row. However, your dataset might contain variations in the data that are not present in this first sample.
Tip: When you first load a dataset in the Transformer page, a new random sample is generated for you from the entire dataset. When the sample is ready, you can select it from the Sampling menu at the top of the Transformer page. The sample is loaded into the Transformer page, and the data quality bars and histograms are updated based on this information. You can generate a new random sample whenever you make a change to the number of rows in your dataset. For more information, see Sampling Menu.
|D s webapp|
The above checks the values in the
Primary_Website_or_URL column against the
Url data type. If the value in the source column is not a valid URL, then the new column value is
true.After sorting After sorting by this new column, all of the invalid URLs are displayed next to each other in the data grid, where you can review them in detail.
derive value:((Rating < 10) || (Rating > 90)) as:'outlier_Rating'
After the above transform is added, you can sort the generated column to group together your customer outliers.
Entire rows can be tested for duplication. The
deduplicate transform allows you to remove identical rows. Note that whitespace and case differences are evaluated as different rows. For more information, see Deduplicate Data.