5 The Grand Visualization
Note: We don’t have enough colors! The colors are recycled but hopefully will still help. Cluster numbers in tooltip for certainty
= factor(clus$cluster)
clusters
<- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
fig add_trace(
x = svd_ump[,1],
y = svd_ump[,2],
text = ~paste('Heading:', head ,"<br>Text: ", raw_text ,"<br>Cluster Number: ", clusters),
hoverinfo = 'text',
color=clusters,
marker = list( opacity=0.6),
showlegend = F
)
# saveWidget(fig, "docs/All_clusters_noTopics_UMAPClus_wNoise.html")
5.1 Omit some noise points for more cluster clarity
We can reduce the noise on the plot by omitting some of the points with high outlier scores; generally I hate doing this because it can be a good way to accidently lose something you didn’t know you wanted. However, it could have it’s advantages as a strategy and the outlier_score
of hdbscan()
is a nice threshold to play with for further analytical paths.
= clus$outlier_scores<0.6
index_subset = svd_ump[index_subset,]
data_subset = raw_text[index_subset]
raw_text_subset = head[index_subset]
head_subset = factor(clus$cluster[index_subset])
clusters
<- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
fig add_trace(
x = data_subset[,1],
y = data_subset[,2],
text = ~paste('Heading:', head_subset ,"<br>Text: ", raw_text_subset ,"<br>Cluster Number: ", clusters),
hoverinfo = 'text',
color = clusters,
marker = list(opacity=0.6),
showlegend = F
)# saveWidget(fig, "docs/All_clusters_noTopics_UMAPClus.html")