3 UMAP
Key Takeaways:
- UMAP is fantastic for visual dimension reduction.
- The UMAP visualizations with GloVe vector input were not as sharp as those with singular vector input.
- Using 15-25 singular vectors as suggested by the screeplot made a fine visualization. However, increasing the number to 150 seemed to help (qualitatively speaking) and did not take much longer, so we stuck with 150.
- One can easily see how this 2D approximation provides nice inputs for any predictive topic model - UMAP does have a function to project new data onto the space - we could easily watch new data coming in be filtered away into groups like
earnings reports
,government data releases
,dividend announcements
etc.
We’ll use the mathemagical Uniform Manifold Approximation and Projection (UMAP) algorithm to project the already dimension-reduced data (150 singular vectors) into 2-space. UMAP is a dimension reduction technique that builds on the notion neighbor graphs with ideas from topology. It is similar to t-SNE in its approach, but the fundamentals are based on firmer (and more complicated) mathematical theory (manifolds/topology).
#svd_ump = umap(svd$v[,1:150])
#save(svd_ump, file='docs/final_data_plots/svd_ump.RData')
load('docs/final_data_plots/svd_ump.RData')
<- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
fig add_trace(
x = svd_ump[,1],
y = svd_ump[,2],
text = ~paste('heading:', head ,"<br>text: ", raw_text ),
hoverinfo = 'text',
marker = list(color='green', opacity=0.6),
showlegend = F
)# saveWidget(fig, "docs/UMAP_noClusters.html")