Keywords: scatter plot, XY plot, x coordinate, y coordinate
A scatter plot displays data along two dimensions and is useful for showing the relationship between two sets of data. Color and size can be used to display additional information.
You can use this workbench in different ways depending on what data you input and what you want to display:
Numeric fields: If you have numeric fields representing x and y coordinates, drag them to the "X coordinate" and "Y coordinate" drop zone respectively.
Vector field: If you have a vector field (for example as a result of using the Reduce Vectors operation), drag it to the "X and Y coordinates" drop zone to display the first two vector dimensions on the x and y axes.
Text field: Dragging a text field to the "Document landscaping" drop zone triggers a sequence of operations that vectorizes and reduces the text to display it in the form of a two-dimensional document landscape.
Categorical field: Categorical fields can be dropped in the "Landscaping based on categorical data" drop zone. This generates a landscape similar to that of Document landscaping, but where the vectors are based on the categorical input rather than text embeddings.
Step-by-step guide
1. Add a Scatter Plot workbench
Click the "Add workbench" button at the top of the workspace and click the "Scatter Plot" symbol. This adds a Scatter Plot workbench to the workspace.
2. Specify x coordinates
Drag the numeric field you would like to use for the x coordinates to the "X coordinate" drop zone in the Scatter Plot workbench.
3. Specify y coordinates
Drag the numeric field you would like to use for the y coordinates to the "Y coordinate" drop zone in the Scatter Plot workbench.
β
β
4. Specify labels
Drag the field containing the labels to the "Display as labels" drop zone in the Scatter Plot workbench.
5. Specify color value
To color dots by value, drag the field you want to use to the "Color by value" drop zone in the Scatter Plot workbench.
6. Explore the data
The labels are shown along with the x and y coordinates when hovering over a dot. You can zoom into areas of interest, for example, the dense part of the plot. Click a dot to select it and drag it to a different workbench if you want to work with it there.
Example: Number of retweets vs. number of likes for tweets
In this example, our dataset contains tweets with fields including the number of likes, number of retweets, message, and tweet type for each tweet. We would like to create a scatter plot that shows the number of retweets and the number of likes of each tweet.
To do this, we drag "retweet_count" to the "X coordinate" drop zone. We then drag "like_count" to the "Y coordinate" drop zone. Tweets are now displayed as dots in the scatter plot. The horizontal axis shows the number of retweets and the vertical axis shows the number of likes.
To label the dots, we select the field containing the labels β in this case, the "message" field which contains the texts of the posts. We drag it to the "Display as labels" drop zone.
We would like to use colors to display the tweet type, so we drag "type" to the "Color by value" drop zone.
The scatter plot now displays the number of retweets and the number of likes of the tweets in the dataset, where each tweet is colored by type and labeled with its content. If the dots are packed densely in a small part of the chart, zoom in to see better or tick "Show heatmap" under "View options" to get a density plot.