Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0710

D toc

Excerpt

For smaller datasets, the Transformer page displays the entire dataset. For larger ones, the source data is sampled for use in the Transformer page. Through the Samples panel, you can create new samples and select them for display in the Transformer page.

At the top of the Transformer page, the type of the current sample is displayed next to the dataset name. To open the Samples panel, click the link. In the example below, the Full Data link indicates that the current sample in the Transformer page is the entire datasetindicator:

D caption
typefigure
Click the Samples linkcurrent sample button.

In the example above, you can see that the current sample is a Random sample.

  • Full Data: The entire dataset is small enough to be displayed in the data grid. The data is not sampled.
  • Initial Sample: The sample is taken from the first set of rows in the first file or table that is part of the dataset. Data from the rest of the first file or table or from other files or tables is not included in the data grid.

    Tip

    Tip: For purposes of loading the data, the initial sample is generated and displayed at first. For a better representation of the entire dataset, you should create a new sample.

To create a new sample, click Collect a new sample.

The Samples panel is displayed on the right side of the screen:

Tip

Tip: You can also open the Samples panel by clicking the Eyedropper icon at the top of the page.



D caption
typefigure
Samples Panel

...

At the top of the panel, you can review the currently loaded sample. Each user has his own active sample on a dataset.

Info

NOTE: When a new sample is generated, any Sort transformations that have been applied previously must be re-applied. Depending on the type of output, sort order may not be preserved.


  • Initial: By default, the application loads the first N rows of the dataset as the initial sample when the Transformer page is opened. The number of rows depends on column count, data density, and other factors. If the dataset is small enough, the full dataset is used. 

    Info

    NOTE: By default, samples may be up to 10 MB in size. For datasets smaller than this limit, the entire dataset is loaded.

  • Click the link in the current sample card to see the list of all available samples.

    Tip

    Tip: To change the name of a sample, click its card in the list of all available. Then, click the Edit icon.

...

Below the current sample, you can review the available options for creating new samples. Each type of sample reflects a different method of collection.

d-s-sampling

  • To collect a new sample, click the appropriate sample card. See below.

...

When a new sample is collected, it is gathered based on the current location in the recipe when the sample is gathered. So, if the recipe contains steps that join in other datasets, those joins are performed to bring together the data from which the sample is executed. 

D caption
typefigure
Collect new sample panel

...

  • Name: You can enter a new name of the sample as needed. 

    Tip

    Tip: Naming your samples can assist in tracking them later. For example, you might choose to add a date stamp to the name to track when you captured the sample.

  • Scan Type: (Does not apply to all sampling methods) Types of scans:
    • Quick - performs a random scan of the dataset to extract the appropriate number of rows for the sample.
    • Full - gathers the sample from the entire dataset. Depending on the size of the dataset, this method can take a while.

  • Column or columns: (Stratified, Cluster-based) Name of the column from which to gather values to evaluate (Anomaly-based) Specify the name or names of one or more columns containing the anomalies to include in your sample. Multiple columns can be specified by comma-separated values. A column range can be specified using the tilde (~) character.
  • Condition: (Filter-based, Stratified, Cluster-based, Anomaly-based) Filter the sample based on a specified condition. For example:

    Code Block
    invoiceDate > 90
  • Anomaly type:(Anomaly-based) Select the type of anomalous values to include in your sample: invalid, missing, or both types.

  • To begin collecting the sample, click Collect.
  • You can continue working while the sample is collected. When the sample is available, a status message is displayed in the Transformer page.
  • You can click Load Sample in the Samples panel to begin using it.

...