Keywords: word cloud, word count
Word clouds are useful for getting a quick overview of the content of the text by displaying words and their frequencies. Use the Bubble View to aggregate and display words (or any other categorical data).
☝️ Note: Make sure your text field of interest has the data type "long text". If not, change the field type before getting started. Unless you trust your text to be clean and tidy, we also recommend using the Preprocessing Wizard to reduce noise in the input data before generating a word cloud.
💡 Quick tip: The fastest way of turning text into a word cloud is to drag your text field to the "Words as bubbles" drop zone in the Bubble View. This triggers the Tokenize and Tag operation, filters tokens by part-of-speech, and aggregates the tokens. If you want full control over the tokenization parameters and application of filters, or want the tokens to be part of your global dataset, follow the step-by-step guide below instead.
1. Tokenize your text
Select your text field and apply the Tokenize & Tag operation to split each text into words and phrases. To reduce noise, make sure "Remove punctuation", "Remove stop words", "Remove XML tags", and "Lowercase" are switched on. To be able to filter by part-of-speech, Make sure the "Part-of-speech" tagging option is switched on. To display both words and phrases in your word cloud, turn on the "Phrase detection" tagging option.
Once the operation has run on your text field, you now have a new field containing a collection of tokens for each text.
2. Add a Bubble / Word view workbench
To create a Bubble / Word view, click "Add workbench" and click the Bubble / Word view symbol. A Bubble / Word view workbench is now added to the workspace.
3. Aggregate and display tokens
Drag the field containing your token values (by default "tokens.value") from the Schema workbench (or the corresponding column in the Table View) to the "Display as bubbles" drop zone in the Bubble View.
The tokens will now be aggregated and displayed as bubbles. By default, the bubble radius reflects the token count, in other words, the number of times the words and phrases have been used in the text. The aggregation function can be changed in the Settings Bar or by dragging a numeric field to the "Use for radius" drop zone.
☝️ Note: In some cases, counting the number of texts or sources in which a word is present can be more relevant than counting the total number of mentions of the word. Consider, for example, a situation where a few article authors are very actively using a word; while the word might appear dominant in a word cloud where the size is based on word count, it would be small when the size is based on source count. Changing the aggregation function can also be useful for incorporating other information in a word cloud, such as the average sentiment or engagement scores of words.
4. Switch to word mode
To view the tokens as words rather than bubbles, switch from "Bubble" to "Word" mode in the workbench header.
The tokens are now displayed as a word cloud.
5. Filter by part-of-speech
Not all words convey the same degree of meaning. Typically, nouns, proper nouns, and adjectives capture more of the meaning in text than, for example, prepositions and conjunctions. To filter words by part-of-speech, click the filter icon next to your tokens value field in the Schema workbench and set the filter to include the parts-of-speech of interest.
👉 Example. If the tokens field is named "tokens" and you want to include only tokens that are nouns, proper nouns, or adjectives, use the following filter settings (highlighted in bold):
Include the rows in tokens.value where tag contains any of selected [noun, proper noun, adjective]
Once the filter has been applied, the word cloud will be updated to only include tokens of included parts-of-speech.
To better understand the context in which words are used, particularly how they co-occur with each other in the text, create a word network.
To see the documents that use one or a combination of words, select the word or words that you are interested in and drag them to the "Score by an occurrence" or "Score by similarity" drop zones in the Table View, where your documents are displayed.