User:(WT-en) Hotelmakerbot/Implementation
The hotelmaker bot was implemented in Perl using the excellent MediaWiki.pm interface module written by Edward Chernenko.
Caching
[edit]This bot makes use of caching which reduces the load on the Wikivoyage server. Basically, here's how it works:
- The bot keeps a copy of the Pagemap for Wikivoyage. It never attempts to read or edit a page which is not in the pagemap.
- The bot keeps a copy of Project:Links to disambiguating pages. It never attempts to read or edit a disambiguation page.
- Anytime a page is fetched, it is added to the cache.
- Before attempting to edit a page, the bot checks to see if the page is in the cache. If it is, the bot loads the cached version and checks to see if the page can be usefully edited. If it cannot be usefully edited, it skips it thus avoiding pestering the Wikivoyage server unnecessarily.
- Before editing a page, the page is deleted from the cache. If the edit is successful, the new version of the page is added to the cache.
Periodically I manually download a new copy of the Pagemap. The Pagemap includes timestamps for when each page was last edited. I run a local script which compares the new pagemap to the old pagemap and purges any pages which have been changed since the old pagemap was downloaded. A minor downside of this is that any page edited by the bot might get re-downloaded again in the future since the bot itself is responsible for the change, but it's a small price to pay considering the amount of effort being expended to avoid pestering the server.
A nice accidental benefit of this is that the bot can be cancelled and restarted at will. Restarts always run through the entire hotel database... but any city which has already been checked or edited will be quickly skipped thanks to caching, so the Wikivoyage server will only be asked for articles which have not already been loaded into the cache on a previous run.