Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r093

...

Excerpt

When you transform your data in the Transformer page, you are performing these transformations on a sample of the total dataset. As needed, you can generate new samples using a variety of algorithms to acquire other slices of your data.

A The initial data sample is always collected from the initial rows of the dataset. Whenever you create a recipe and open the dataset in the Transformer page,

D s webapp
  automatically generates the initial sample. 

  • By default, the initial sample

...

  • is the first 10 MB of your dataset.
    • The size of the sample can be modified by an administrator.
    • For file-based sources, the initial sample is taken from a limited number of files. 
      • By default, this limit is set to 10 files.
      • The maximum number of files from which a sample can be generated can be defined by an administrator.
  • If your dataset is less than 10 MB, then the entire dataset is may be loaded as an initial sample. 
  • For datasets larger than 10 MB, the first 10MB of rows are loaded into the Transformer page.
Tip

Tip: On the Transformer page, this first sample is listed as Initial Data. For more information on how this special sampling type is generated, see Overview of Sampling.

When to Take a New Sample

...

Info

NOTE: Generation of a new sample is executed as a job. Quick scan jobs are executed through

D s photon
on the
D s node
, while Full scan jobs are executed on the integrating an available clustered running environment. Depending on your deployment, there may be costs associated with generating a sample.

...

  • You are working with complex and wide datasets.
  • You have complex flows.
  • Your dataset has a bad data or outliers that may require a different sample.
  • You have datasets with more than 10 MB of data.
  • You have added one or more multi-dataset operations with steps, such as a join, union, pivot, or lookup.

Change Sample Size

If you are encountering low-memory conditions related to sampling or wish to improve the performance of the sampling process, you can adjust the size of the samples that are displayed in the browser for your current recipe. For more information, see Change Recipe Sample Size.

Limitations

  • Advanced sampling options are available only with a full scan of the dataset.
  • Undo/redo do not change the sample state, even if the sample becomes invalid. 
  • When executed on the 
    D s photon
     running environment, samples taken from a dataset with parameters are limited to a maximum of 50 files.

Collect a New Sample

You can use the existing loaded sample, or you can collect a new sample to use.

...

  1. In the Transformer page, click the Eyedropper icon at the top of the page. 
  2. From the Samples panel, select the required type of sample. For more information, see Sample Types.

  3. In the Collect new sample panel, select either Quick or Full scan.
    1. Quick: Creates a sample by partial scanning of the dataset and yields quicker results.

      Tip

      Tip: Quick scan samples are executed by default in the

      D s photon
      running environment. If that environment is not available, the
      D s webapp
      may attempt to run the Quick Scan job on an available clustered running environment.

    2. Full: Creates a sample by scanning the full dataset. This method takes a longer time depending on the size of the dataset.

      Tip

      Tip: Full scan samples are executed in the cluster running environment.

  4. Click Collect to collect the samesample. A sample job ID is generated for each sample you collect. When the sample is available, a status the Load Sample message is displayed in the Transformer page.
  5. Click Load Sample to start using the sample.

Cancel a Sample

...

  1. To load the sample, click Load Sample.

Example - Random sample

Random samples can be generated from a quick or full scan of your dataset. 

...

  1. In the Transformer page, click the Eyedropper icon at the top of the page. 
  2. From the Samples panel, select Anomaly-based sample.
  3. In the Collect new sample panel, enter the following details:
    1. From the Scan column, select Quick. For more information, see "Collect a New Sample" above.
    2. Select the required column: Discount.
    3. From  the anomaly type, select Find missing values only.
  4. Click Collect. A confirmation message is displayed.
  5. Click Load sample. The Anomaly-based sample is loaded with the missing values for the Discount column.

Cancel Sample

To cancel a sample collection, click the X next to the progress bar. The interrupted sample is listed as unavailable in the Collected samples panel. 

Load

...

Sample

You can create as many samples as required based on your dataset. All collected samples are available in the Collected samples panels, where you can review and load them as required. 

...

  1. In the Samples panel, click See all collected samples.
  2. From the Collected samples panel, select the required sample from the Available tab. For more information, see "Collected Samples" below.

    Info

    NOTE: Samples listed under the Unavailable tab are invalid for the current state of your recipe. You cannot select these samples for use.

  3. If you want to edit the sample name, click the Pencil icon against the sample.

Delete

...

Sample

After you have created a sample, you cannot delete it through the application.

Info

NOTE:

D s product
does not support deletion of samples after they have been created. For more information, contact your IT administrator. 

Sample Types

...

Invalid Samples

Info

NOTE: Samples are valid based on the state of your flow and recipe at the step where the sample was collected.

...

Info

NOTE: If the sample is reverted to an earlier sample, then more steps between when that sample was generated and your current location in the recipe are generated in the browser's memory. Browser performance may be impacted.

Info

NOTE: If you modify a SQL statement for an imported dataset, any samples based on the old SQL statement are invalidated.

Collected Samples

The collected samples store the details of your samples collected for your dataset. In the Samples panel, click See all collected samples link.

Image Added

D caption
Collected samples


The collected samples contain the following tabs:

  • Available: Displays the available samples that can be used. You can click Load to load the required sample. 
  • Unavailable: Displays the invalid samples, which cannot be selected for use. If subsequent steps make a sample valid again, it is moved to the Available tab.
  • All: Displays both the available and unavailable samples.

You can click the sample name to view the sample details.

Image Added

D caption
Sample details
  • Load: Click Load to load the sample.
  • Rename: Click Rename to rename the sample

Review Sample Jobs

You can review and manage all of your samples like transformation jobs. For more information, see Sample Jobs Page.

Best Practices

For more information on best practices, troubleshooting, and browser crashes, see https://community.trifacta.com/s/article/Best-Practices-Managing-Samples-in-Complex-Flows.

D s also
inCQLtrue
label((label = "discovery_tasks") OR (label = "sampling_tasks"))