The meaningful visualization of clusters
In the last few weeks I am reading and trying out fresh papers that have something to do with unlabeled data — like really messy and unstructured one. After I tried UMAP clustering I found this beautiful way of visualizing clusters with embedded images in the plot by using the bokeh package (https://umap-learn.readthedocs.io/en/latest/basic_usage.html).
The main problem was that this code was made for linked images, i.e. simply not for the images from your folder. Ladies and gentlemen, for everyone who is in search of a way to display your clustered data, this article is just for you!
Usually, you get a plot like this after you have done clusterization:
How do you analyze this plot? It is not so easy to overview it, huh? What is much better, is to get a nice interactive zoomable plot like this one:
All code is available as a Jupyter notebook here: https://github.com/zapaishchykova/medium-bokeh-viz
First of all, you need the dataset. For the demonstration proposes I used Tiny Imagenet, specifically the test folder.
Maybe this is not the best data to cluster — UMAP in combination with HDBSCAN works pretty well on MNIST dataset and on overall smaller images. I would say that 50 is the minimum number of dimensions that I would try to reduce to. Nevertheless, the purpose of this post is to show a nice and easy way to visualize the clustering results.
In the Jupiter notebook, you can find the clusterization with UMAP + HDBSCAN, but this approach will work with any of the clustering algorithms.
df = pd.DataFrame(embedding, columns=('x', 'y'))
df['class'] = [str(x) for x in labels]
df['image'] = list(map(b64_image_files, instances))datasource = ColumnDataSource(df)
color_mapping = CategoricalColorMapper(factors=[str(colors - x) for x in np.unique(labels)], palette=Spectral10)plot_figure = figure(
title='UMAP projection',
plot_width=600,
plot_height=600,
tools=('pan, wheel_zoom, reset')
)plot_figure.add_tools(HoverTool(tooltips="""
<div>
<div>
<img src='@image' style='float:left; width:100px;height:100px; margin: 5px 5px 5px 5px'/>
</div>
<div>
<span style='font-size: 16px; color: #224499'>Class:</span>
<span style='font-size: 18px'>@class</span>
</div>
</div>
"""))plot_figure.circle(
'x',
'y',
source=datasource,
color=dict(field='class', transform=color_mapping),
line_alpha=0.6,
fill_alpha=0.6,
size=4
)
show(plot_figure)
By the way, if you are interested in UMAP details (which I totally recommend!), check them out here: https://umap-learn.readthedocs.io/en/latest/clustering.html and simply beautiful visualizations what it can do here: https://johnhw.github.io/umap_primes/index.md.html .
Feel free to add your suggestions in the comment section below!