I've written a script to scrape the websites of some major hotel chains to all the info needed to create a Wikivoyage accomodation listing.
I have big, unrealistic, and totally unrealized plans for this tool. Right now it does the following:
- Scrapes major chains
- Emits a wiki-formatted list of entries
I eventually hope to do the following too:
- Scrape Wikivoyage to figure out the correct name for a city. For example, is it Newark or Newark (California)?
- Keep a database of entries.
- Keep track of which entries have been inserted into articles so as to avoid re-adding them later if someone deletes them (see Project:Avoid negative reviews).
- Be able to update entries if web site changes or whatever.
- If someone adds descriptive text, be able to merge changes in and respect the human-added text
- Be able to add a new hotel chain, and emit just the entries for that chain
- If the number of entries gets out of hand (Las Vegas), be able to randomly select a few automatically.
If anyone wants access to the current list for a particular state, please ask and I'll upload it. But for the sake of my future dreams for the hotel program, I ask that you please copy entries verbatim. If the list for a particular city is large, it's fine if you just copy some of the entries, just don't change the entries if you can help it.
Requests for Comments
Please feel free to make comments about all this stuff on the talk page. I'm especially interested in:
- Is the formatting correct.
- Let me know about any bogosity in the data
please copy entries verbatim (see explaination for why above)
Consider clicking on the website entry for each hotel you add and verifying that the website, phone, and address are correct.