7 UMAP of Full Sparse Matrix in Python
We first write the sparse normalized tf-idf matrix to file:
writeMM(tfidf_tdm, file='/Users/shaina/Library/Mobile Documents/com~apple~CloudDocs/Datasets and Code/reuters21578/tfidf_norm_tdm')
Then after installing umap-learn via
git clone https://github.com/lmcinnes/umap
cd umap
pip install --user -r requirements.txt
python setup.py install --user
import numpy as np
import scipy.sparse
import sympy
import sklearn.datasets
import sklearn.feature_extraction.text
import umap.umap_ as umap
import umap.plot
import matplotlib.pyplot as plt
import csv
= scipy.io.mmread('/Users/shaina/Library/Mobile Documents/com~apple~CloudDocs/Datasets and Code/reuters21578/tfidf_norm_tdm')
A =A.tolil()
A=A.transpose()
A= umap.umap_.UMAP(metric='cosine', random_state=42, low_memory=True).fit(A)
mapper =np.arange(19744), theme='viridis') umap.plot.points(mapper, values
We then exported the layout in mapper.embedding_
for exploration with our usual plot function:
= "/Users/shaina/Library/Mobile Documents/com~apple~CloudDocs/Datasets and Code/reuters21578/UMAPofTFIDFsparse.csv"
filename
# writing to csv file
with open(filename, 'w') as csvfile:
= csv.writer(csvfile)
csvwriter csvwriter.writerows(mapper.embedding_)
= read.csv("/Users/shaina/Library/Mobile Documents/com~apple~CloudDocs/Datasets and Code/reuters21578/UMAPofTFIDFsparse.csv", header=F)
layout <- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
fig add_trace(
x = layout[,1],
y = layout[,2],
marker = list(color = 'green',opacity=0.6),
showlegend = F
)
#saveWidget(fig, file='docs/SparseDataIntoUMAP.html')