User:Cjensen~enwikivoyage/project/hotelmaker
I've written a script to scrape the websites of some major hotel chains to all the info needed to create a Wikivoyage accomodation listing.
I have big, unrealistic, and totally unrealized plans for this tool. Right now it does the following:
- Scrapes major chains
- Emits a wiki-formatted list of entries
I eventually hope to do the following too:
- Scrape Wikivoyage to figure out the correct name for a city. For example, is it Newark or Newark (California)?
- Keep a database of entries.
- Keep track of which entries have been inserted into articles so as to avoid re-adding them later if someone deletes them (see Project:Avoid negative reviews).
- Be able to update entries if web site changes or whatever.
- If someone adds descriptive text, be able to merge changes in and respect the human-added text
- Be able to add a new hotel chain, and emit just the entries for that chain
- If the number of entries gets out of hand (Las Vegas), be able to randomly select a few automatically.
If anyone wants access to the current list for a particular state, please ask and I'll upload it. But for the sake of my future dreams for the hotel program, I ask that you please copy entries verbatim. If the list for a particular city is large, it's fine if you just copy some of the entries, just don't change the entries if you can help it.
Requests for Comments
[edit]Please feel free to make comments about all this stuff on the talk page. I'm especially interested in:
- Is the formatting correct.
- Let me know about any bogosity in the data
Data
[edit]please copy entries verbatim (see explaination for why above)
Consider clicking on the website entry for each hotel you add and verifying that the website, phone, and address are correct.