The meaningful visualization of clusters

Anna Zapaishchykova
3 min readAug 29, 2019
This cat is waiting for a neural network to train (Photo by Marnhe du Plooy on Unsplash)

In the last few weeks I am reading and trying out fresh papers that have something to do with unlabeled data — like really messy and unstructured one. After I tried UMAP clustering I found this beautiful way of visualizing clusters with embedded images in the plot by using the bokeh package (https://umap-learn.readthedocs.io/en/latest/basic_usage.html).

The main problem was that this code was made for linked images, i.e. simply not for the images from your folder. Ladies and gentlemen, for everyone who is in search of a way to display your clustered data, this article is just for you!

Usually, you get a plot like this after you have done clusterization:

The plot that was made by the seaborn package with some random data

How do you analyze this plot? It is not so easy to overview it, huh? What is much better, is to get a nice interactive zoomable plot like this one:

The plot made by bokeh

All code is available as a Jupyter notebook here: https://github.com/zapaishchykova/medium-bokeh-viz

First of all, you need the dataset. For the demonstration proposes I used Tiny Imagenet, specifically the test folder.

Maybe this is not the best data to cluster — UMAP in combination with HDBSCAN works pretty well on MNIST dataset and on overall smaller images. I would say that 50 is the minimum number of dimensions that I would try to reduce to. Nevertheless, the purpose of this post is to show a nice and easy way to visualize the clustering results.

In the Jupiter notebook, you can find the clusterization with UMAP + HDBSCAN, but this approach will work with any of the clustering algorithms.

df = pd.DataFrame(embedding, columns=('x', 'y'))
df['class'] = [str(x) for x in labels]
df['image'] = list(map(b64_image_files, instances))
datasource = ColumnDataSource(df)
color_mapping = CategoricalColorMapper(factors=[str(colors - x) for x in np.unique(labels)], palette=Spectral10)
plot_figure = figure(
title='UMAP projection',
plot_width=600,
plot_height=600,
tools=('pan, wheel_zoom, reset')
)
plot_figure.add_tools(HoverTool(tooltips="""
<div>
<div>
<img src='@image' style='float:left; width:100px;height:100px; margin: 5px 5px 5px 5px'/>
</div>
<div>
<span style='font-size: 16px; color: #224499'>Class:</span>
<span style='font-size: 18px'>@class</span>
</div>
</div>
"""))
plot_figure.circle(
'x',
'y',
source=datasource,
color=dict(field='class', transform=color_mapping),
line_alpha=0.6,
fill_alpha=0.6,
size=4
)
show(plot_figure)
Another example of visualization with bokeh.

By the way, if you are interested in UMAP details (which I totally recommend!), check them out here: https://umap-learn.readthedocs.io/en/latest/clustering.html and simply beautiful visualizations what it can do here: https://johnhw.github.io/umap_primes/index.md.html .

Feel free to add your suggestions in the comment section below!

The cat is happy with visualization. (Photo by Fernando Lavin on Unsplash)

--

--