5 The Grand Visualization

Note: We don’t have enough colors! The colors are recycled but hopefully will still help. Cluster numbers in tooltip for certainty

clusters = factor(clus$cluster)

fig <- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
  add_trace(
    x = svd_ump[,1],
    y = svd_ump[,2],
    text = ~paste('Heading:', head ,"<br>Text: ", raw_text ,"<br>Cluster Number: ", clusters),
    hoverinfo = 'text',
    color=clusters,
    marker = list( opacity=0.6),
    showlegend = F
  )

# saveWidget(fig, "docs/All_clusters_noTopics_UMAPClus_wNoise.html")

Full Page Visualization

5.1 Omit some noise points for more cluster clarity

We can reduce the noise on the plot by omitting some of the points with high outlier scores; generally I hate doing this because it can be a good way to accidently lose something you didn’t know you wanted. However, it could have it’s advantages as a strategy and the outlier_score of hdbscan() is a nice threshold to play with for further analytical paths.

index_subset = clus$outlier_scores<0.6
data_subset = svd_ump[index_subset,]
raw_text_subset = raw_text[index_subset]
head_subset = head[index_subset]
clusters = factor(clus$cluster[index_subset])

fig <- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
  add_trace(
    x = data_subset[,1],
    y = data_subset[,2],
    text = ~paste('Heading:', head_subset ,"<br>Text: ", raw_text_subset ,"<br>Cluster Number: ", clusters),
    hoverinfo = 'text',
    color = clusters,
    marker = list(opacity=0.6),
    showlegend = F
  )
# saveWidget(fig, "docs/All_clusters_noTopics_UMAPClus.html")

Full Page Visualization