Keywords: keyword extraction, TextRank
While the Tokenize & Tag operation splits the text into words and phrases, Extract Keywords extracts the words that best describe a text.
Step-by-step guide
1. Open the operation configuration window
Select the field with the text that you want to extract keywords from and click the "Add operation" button at the top of the workspace.
Search for "Extract Keywords" or find the operation under "Content extraction" and click it.
2. Name the output field
Under "Output field name", type the name of the output field.
3. Specify the method
Under "Method", select the method to be used to calculate the most important words. There are two options:
TextRank is equivalent to PageRank and measures a word's eigenvector centrality in word co-occurrence networks. Words that co-occur with other central words get a high Text Rank score.
Montemurro and Zanette entropy is especially useful for text with more than a thousand words. It uses information theory to identify words with a large contribution to the overall information in the text. The method is described in this paper.
4. Specify the number of keywords to be extracted
Under "Number of keywords", type the number of keywords that you want to extract from each text.
5. Apply the operation
Click "Apply" to run the operation. The extracted keyword collections are inserted into the output field.