Sampling

The Sample operation selects a random sample from a dataset.

Tomas Larsson avatar
Written by Tomas Larsson
Updated over a week ago

Keywords: sample

The Sample operation creates a sample of data by randomly selecting a user-specified number of rows from a dataset.

Step-by-step guide

1. Open the operation configuration window

Select the dataset that you want to apply the operation on in the Schema workbench and click the "Add operation" button at the top of the workspace.

Search for "Sample" or find the operation under "Preprocessing & cleaning" and click it.
​
​


​

2. Specify sample size

Under "Size", type the number of rows to be sampled from the dataset.

3. Specify seeding

Seeding can be used to "freeze" the randomness of the sampling, so that you get the same sample every time you run the pipeline. Leave "Seed" blank if you want the sample to be selected at random each time the pipeline is run.

4. Apply the operation

Click "Apply" to run the operation. A new dataset containing the sample has now been generated.

Did this answer your question?