Keywords: sample
The Sample operation creates a sample of data by randomly selecting a user-specified number of rows from a dataset.
Step-by-step guide
1. Open the operation configuration window
Select the dataset that you want to apply the operation on in the Schema workbench and click the "Add operation" button at the top of the workspace.
Search for "Sample" or find the operation under "Preprocessing & cleaning" and click it.
β
β
β
2. Specify sample size
Under "Size", type the number of rows to be sampled from the dataset.
3. Specify seeding
Seeding can be used to "freeze" the randomness of the sampling, so that you get the same sample every time you run the pipeline. Leave "Seed" blank if you want the sample to be selected at random each time the pipeline is run.
4. Apply the operation
Click "Apply" to run the operation. A new dataset containing the sample has now been generated.