5 The Grand Visualization
Note: We don’t have enough colors! The colors are recycled but hopefully will still help. Cluster numbers in tooltip for certainty
clusters = factor(clus$cluster)
fig <- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
add_trace(
x = svd_ump[,1],
y = svd_ump[,2],
text = ~paste('Heading:', head ,"<br>Text: ", raw_text ,"<br>Cluster Number: ", clusters),
hoverinfo = 'text',
color=clusters,
marker = list( opacity=0.6),
showlegend = F
)
# saveWidget(fig, "docs/All_clusters_noTopics_UMAPClus_wNoise.html")5.1 Omit some noise points for more cluster clarity
We can reduce the noise on the plot by omitting some of the points with high outlier scores; generally I hate doing this because it can be a good way to accidently lose something you didn’t know you wanted. However, it could have it’s advantages as a strategy and the outlier_score of hdbscan() is a nice threshold to play with for further analytical paths.
index_subset = clus$outlier_scores<0.6
data_subset = svd_ump[index_subset,]
raw_text_subset = raw_text[index_subset]
head_subset = head[index_subset]
clusters = factor(clus$cluster[index_subset])
fig <- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
add_trace(
x = data_subset[,1],
y = data_subset[,2],
text = ~paste('Heading:', head_subset ,"<br>Text: ", raw_text_subset ,"<br>Cluster Number: ", clusters),
hoverinfo = 'text',
color = clusters,
marker = list(opacity=0.6),
showlegend = F
)
# saveWidget(fig, "docs/All_clusters_noTopics_UMAPClus.html")