Skip to main content

Create Samples Tool

Use Create Samples to split the input rows into 2 or 3 random samples. In the tool, you specify the percentage of rows that are in each sample. If the total is less than 100%, the remaining rows will be output to the holdout, or H anchor.

Configure the Tool

  1. Select the Row Allocation. The sum of Sample 1 and Sample 2 percentages must be less than or equal to 100. If the total is less than 100%, the remaining percentage will be output to the H anchor:

    • Sample 1: Output to the E anchor. This is the percentage of the data to place in the estimation sample (between 1% and 99%).

    • Sample 2: Output to the V anchor. This is the percentage of the data to place in the validation sample (between 1% and 99%).

  2. Enter a Random seed: An integer value between 1 and 1000 which provides the starting point in generating random numbers. Changing this value will alter the sample that an individual row of the data is placed in. Unless there is a specific reason to change this value, the default value of 1 is the recommended choice.

View the Output

There are 3 outputs from the Create Samples tool:

  • E anchor: The Estimation output stream will contain a random sample of input rows. The count of rows in this stream is equal to the percent of total rows specified in Sample 1.

  • V anchor: The Validation stream will contain a random sample of input rows. The count of rows in this stream is equal to the percent of total rows specified in Sample 2.

  • H anchor: The Holdout stream includes any leftover rows that were not placed in either the Estimation or Validation samples.

If there is an odd number of rows and Estimation and Validation are both set to 50%, the E anchor output stream has one more row than the V anchor stream.