A Map of the Geographic Structure of Wikipedia Topics

A Map of the Geographic Structure of Wikipedia Topics

A large number of Wikipedia articles are geocoded. This means that when an article pertains to a location, its latitude and longitude are linked to the article. As you can imagine, this can be useful to generate insightful and eye-catching infographics. A while ago, a team at Oxford built this magnificent tool to illustrate the language boundaries in Wikipedia articles. This led me to wonder if it would be possible to extract the different topics in Wikipedia.

This is exactly what I managed to do in the past few days. I downloaded all of Wikipedia, extracted 300 different topics using a powerful clustering algorithm, projected all the geocoded articles on a map and highlighted the different clusters (or topics) in red. The results were much more interesting than I thought. For example, the map on the left shows all the articles related to mountains, peaks, summits, etc. in red on a blue base map. The highlighted articles from this topic match the main mountain ranges exactly.

Add a Comment

Login or register to post comments

COMMENTS

llh's picture

How to handle high quality Geographic image with high performance? In this sample, I tried to zoom in (+) it. But it is not in very high visualization quality. And it is slow to me to display the part I want to look into. I do not know whether this is because of the network connection speed. But I think user experience is very important, especially in Web application. Then big data integration is also a problem. How to load big data efficiently?

 
Posted Feb 4, 2013
 
Views: 3164
Tags geocoded, topic modeling, Wikipedia
Tools No tools added yet.
Data
 
Share
Embed
<iframe src="http://www.visualizing.org/embedded/48039" width="620" height="450" frameborder="0" scrolling="no" marginheight="0" marginwidth="0"></iframe>
Need help embedding?