Keywords: clean text, remove emojis, remove URLs, remove XML tags, remove line breaks, remove tabs, remove punctuation, remove @ tags, remove hashtag.
Text tends to contain many elements that create noise and that are not relevant for the analysis. The Clean Text operation allows you to select elements to remove from text.
Step-by-step guide
1. Open the operation configuration window
Select the field with the text that you want to clean and click the "Add operation" button at the top of the workspace.
Search for "Clean Text" or find the operation under "Preprocessing & cleaning" and click it.
โ
โ
2. Name of the output field
Under "Output field name", type the name of the output field.
3. Choose elements to be cleaned
Select one or more elements to be removed:
Emojis
URLs
Line breaks
Hashtag sequence ending (useful for removing a sequence of hashtags at the end of texts, which are often used as content tags, while keeping hashtags inside the text, which are often needed for context)
XML tags
Tabs
Punctuation
Prefixes (for example "#" to remove the hash symbol from hashtags)
Words with prefixes (for "#" to remove entire hashtags)
Substrings (to remove a particular string of characters), with the choice to make the matching case-sensitive or case insensitive.
4. Run the operation
Click "Apply" to run the operation. The cleaned texts are now inserted into the output field.