Classify text (zero-shot)

Predicts labels using BART-based zero-shot classifiers without requiring training a text classifier beforehand.

Zafer Çavdar avatar
Written by Zafer Çavdar
Updated over a week ago

The classify text (zero-shot) operation does not require the training of a text classifier beforehand and calculates relevance scores for arbitrarily defined labels based on the input text. It is used primarily in content-based classifications.

Step-by-step guide

1. Open the operation configuration window

Click the "Add operation" button at the top of the workspace.

Search for "Classify text (zero-shot)" or find the operation under "Text enrichment" and click it.

2. Specify the language

In the "Language" drop-down, select the language of your input text. Currently, you can use this operation in 15 different languages.

3. Add labels

Multi-label mode allows scoring each label independent of the other labels, and each label gets a relevance score between 0 and 1.0. There should be at least one label when the multi-label mode is enabled. By disabling this feature, you can categorize the labels into mutually exclusive categories and calculate the scores together. Therefore, there should be at least two labels when the multi-label mode is disabled.

To add labels, type a label and hit enter. This field supports entering multiple labels.

4. Auto-detect score threshold

The score threshold is the level you set for removing the labels with less relevance than the set threshold. Turn on the auto-detect score threshold to find the input-specific optimal score threshold when filtering labels.

5. Name the output field

Under "Output field name,” type the name of the output field.

6. Apply the operation

Click "Apply" to run the operation. Now, the text data is classified according to the labels. The Schema output is shown below as “Scores.” To review the output, create a Table View workbench and carry the dataset to the workbench. Below the Scores field, there are three subfields, which are id, label, and score. Label shows the classified label regarding its context, whereas the score indicates the relevancy score of the labeling.

Did this answer your question?