2013-12-19

Creating Tagclouds with PyTagCloud

Tag clouds are a nice way to visualize textual information. They provide a colorful overview of frequent terms of a text and they might also tell you something about it's writing style.

For instance, the following is a tag cloud of the famous paper "Cramming more components onto integrated circuits" by Gordon Moore. The script that was used to create it, can be downloaded here.
The script uses PyTagCloud, which gets most of the job done. Cloning the git repository, building and installing is straight-forward. Do not forget to have pygame installed.

Nice Tagclouds can not be created fully automatically. To create beautiful tag clouds, natural language text usually needs a bit of preprocessing. The script provided above uses NLTK for stop word removal and calculating term frequencies. Moreover it might be necessary to manually change term frequencies or to remove certain terms entirely.

PyTagCloud supports exporting the tag cloud to .png images. Exporting to HTML/CSS is also almost possible, but this feature seems a little broken at the time of this writing. PyTagCloud will not export correctly whether a term should be rotated or not resulting in tag clouds with overlapping terms.

No comments:

Post a Comment