Collecting and filtering PowerPoint slides

Collecting and filtering PowerPoint slides

An experiment in collecting and filtering Microsoft PowerPoint slides.

A custom automated tool has been written to scrape Google results for .ppt files related to any given keyword. After downloading 1000 randomly chosen slideshows, texts from the slideshows have been extracted to learn more about the data.

Based on the text content of each slide I was able to compose mosaics of the most text-heavy and content-less slides. A mosaic of backgrounds extracted from all collected slides has been assembled and sorted by average brightness.

By Antoni Kaniowski

Add a Comment

Login or register to post comments
Posted Nov 10, 2012
Views: 1426
Tags Powerpoint
Tools ImageMagick, IrfanView, Processing, ruby, Sheet Maker
Data No data sets referenced yet.
<iframe src="" width="620" height="450" frameborder="0" scrolling="no" marginheight="0" marginwidth="0"></iframe>
Need help embedding?