Keywords: content analysis, free-form text responses, survey analysis
This tutorial describes a process for identifying topics and sentiment in free-form text responses. For this example, the public Community Survey Open-ended Comments (2016 & 2017) dataset from the City of Austin is used. Respondents have answered the question βIf there was ONE thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?β
The output of the analysis include:
A table with topics in the responses along with their volume, sentiment, and representative texts.
A foam chart displaying the topics visually.
Step-by-step guide
Step 1: Import the data
Open a new project, click "Import data from file", upload the file with your dataset if it has not already been done, select the file and click "Select file". Select the fields to be imported (make sure to include the field with your free-form text responses) and click "Import". The data will now be imported into your project.
Shortcut: Instant topic detection
Instead of running tokenization and topic detection one-by-one, you can drag the field with free-form text responses from the Schema workbench to the "Detect topics" drop zone in the Foam Chart workbench. Tokenization and topic detection will be run with default settings. To change them, expand the Manage operations sidebar, adjust the operation settings, and click "Apply".
Step 2: Tokenize the free-form text responses
Split the free-form text responses into words and phrases by using the Tokenize and tag operation. With the current dataset, it takes about one minute for the operation to run with Dcipher's Wizard plan.
A field with token collections has now been added to the dataset.
Step 3: Run topic modeling
Run topic modeling on the level of comments (root level in the current example) and with the tokens (the "tokens.value" field) as input using Dcipher's Detect topics operation. With the current dataset, it takes about one minute for the operation to run with Dcipher's Wizard plan.
As method, select "Semantic clustering", which unlike LDA and CorEx uses semantic information to identify topics in the data. This means it can spot similar content across comments regardless of whether they use the same keywords.
Topic broadness can be increased to get broader themes rather than more specific topics. In the current example the default setting, which looks for specific topics, is used.
Step 4: Run sentiment analysis
Run sentiment analysis on the free-form text responses using Dcipher's Analyze sentiment operation. With the current dataset, it takes about half a minute for the operation to run with Dcipher's Wizard plan.
Step 5: Display the topics as a foam chart
Click "Add workbench" at the top of the workspace and click Foam Chart to add a foam chart to the workspace.
Drag the "topics" field from the Schema workbench to the "Display topics" drop zone in the *Foam Chart* workbench to display the topics. Enlarge the foam chart to give the topics more space. Under "View", enable "Flatten subgroups" and "Hide labels" for better visibility. You can adjust "Min. sub-group diameter" to increase the size threshold for the topics to be displayed.
The chart can be downloaded by clicking the three-dot "More" icon in the workbench header, clicking "Download". Type the export file name, set the file format to "png" and click "Save".
Step 6: Display the topics in table form
To arrange the topics and their information in table form, drag the field "topics.label" from the Schema workbench to the *Group by field* drop zone in Table View. The topics and their number of comments are now displayed in the table. To add columns to the table, drag fields to the "Apply function" drop zone in Table View.
In the current example:
Average sentiment. Dragging the "sentiment" field to the *Apply function* drop zone and selecting "Average" as the function adds a column with the average sentiment score for each topic.
Related council districts. Dragging the "Council District" field to the *Apply function* drop zone and selecting "Overrepresented units" as the function identifies the council districts that are most overrepresented in relation to each topic (based on the extent to which respondents in a certain district are more prone to mention a certain topic).
Illustrative responses. Dragging the "Comments" field to the "Apply function" drop zone and selecting "Calculate representative texts" as the function displays a selection of texts that are representative for each topic. Click the arrow in the column header and click "Flatten" to display all five columns in the table.
The table can be exported from the three-dot "More" menu in the Table View header.
More information
For more information about analyzing free-form text responses in surveys, see this blog post.