Bulk downloading Wikimedia Commons categories

Published by Taavi Väänänen on October 13, 2024.

Wikimedia Commons, the Wikimedia project for freely licensed media files, also contains a bunch of photos by me and photos of me at various events. While I don't think Commons is going away anytime soon, I would still like to have a local copy of those images available on my own storage hardware.

Obviously this requires some way to query for photos you want to download. I'm using Commons categories for this, since that's easy to implement and works for both use cases. The Commons community tends to come up with very specific categories that you can use, and if not, you can usually categorize the files yourself.

Me replying 'shh' to a Discord message showing myself categorizing photos about me and accusing me of COI editing

thankfully Commons has no such thing as a Conflict of interest (COI) policy

There is almost an existing tool for this: Sam Wilson's mwcli project has support for exporting images one has uploaded to Commons. However I couldn't use that to upload photos of me others have uploaded, plus it's written in PHP and I don't exactly want to deal with the problem of figuring out how to package it in a way I could neatly install it on my NAS.

So I wrote my own tool for it, called comload. It's written in Python because Python is easy to deploy (I can just throw it in a .deb and upload it to my internal repository), and because I did not find a Go library to handle Action API pagination for me. The basic usage is like this:

$ comload --subcats "Taavi Väänänen"

This will download any files in Category:Taavi Väänänen and its sub-categories to the current directory. Former image versions, as well as the image description and SDC data, if any, is also included. And it's smart enough to not download any files that are already there on future runs, so you can just throw it in a systemd timer to get any future files. I'd still like it to handle moved files without creating a duplicate copy, but otherwise I'm really happy with the current state.

comload is available from PyPI and from my Git server directly, and is licensed under the GPLv3.

This article is tagged as: project wikimedia

Feedback? Please email any comments to hi@taavi.wtf, or toot them at @taavi@wikis.world.