Wikivoyage talk:Database dump

From Wikivoyage
Jump to navigation Jump to search

Project forks[edit]

Actually, I think it is fair to mention Wikivoyage specifically here because the failure to provide database dumps, and the lack of even an explanation for this failure, is a good faith issue which drains off contributors to that site in particular, thereby weakening our contributor base and generally diluting the strength of the "free-content travel guide community" (if I can call it that?). --(WT-en) Peter Talk 00:54, 4 September 2007 (EDT)

For reference see Project:How to re-use Wikivoyage guides#Keeping an up to date mirror and Project:How to re-use Wikivoyage guides#Data Dumps? for older discussions on the subject. Also see Project:How to re-use Wikivoyage guides#XML data feeds, which claims it is possible to get a full XML dump of Wikivoyage, although to my knowledge no one has ever received one. I suspect shared: has additional discussions on the subject, but didn't search for them. -- (WT-en) Ryan • (talk) • 01:13, 4 September 2007 (EDT)
I agree that there should be dumps. I removed the WV link as part of an ongoing effort to keep their spam off of our site. – (WT-en) cacahuate talk 01:19, 4 September 2007 (EDT)
I think it's antagonizing to consider mentioning a related project as spam. Especially if the project is very relevant in this context, since it is offering database dumps.
Also note that I haven't given up on contributing to WT just yet. So I also consider it a bit weird to consider "me" part of "them". (WT-en) Guaka 10:27, 4 September 2007 (EDT)
I don't consider you part of them... but we've had a bit of a WV spam problem lately, which admittedly I have little tolerance for... the sentence as it is now is true and more general. Lack of dumps may lead to project forks... not just to WV. Personally if I left WT and wanted to fork, I would start something new, not an English version of WV... which is why I would prefer the sentence to be more broad. – (WT-en) cacahuate talk 03:14, 7 September 2007 (EDT)
Hi Guys I just wanted to weigh in that WV isnt the only who has forked and had some interesting dealings with XML dumps. If I went into details it would probably come across spammy or start a flame war, so basically I just leave it at we had a surprising explanation surrounding the No we received from IB. As a result, I'd be very amazed if you guys ever see data dumps on WT so try no to be too hard on folks who've forked to other projects. -- (WT-en) Jeremy (talk) 01:56, 12 September 2007 (EDT)
Actually, I would be quite interested in the reasons given...? (WT-en) Jpatokal 02:13, 13 September 2007 (EDT)
Me too, details please – (WT-en) cacahuate talk 02:17, 13 September 2007 (EDT)
Me three. The lack of available database dumps or even the rumored XML exports has been a sore point with contributors for a long time, although not so sore a point that most people have wanted to fork the project. What was your experience? -- (WT-en) Ryan • (talk) • 02:48, 13 September 2007 (EDT)


Ok I'll try to be brief and neutral: About a year ago I got back from a 9 month trip around the world, and decided to start an online travel information company (Travature). Mostly because, in my own travels I was frustrated by the lack of integration between different travel segments (flights, guides, etc) and the fact that nearly every travel company was trying to sell you something, nobody was just trying to help out travelers. I really liked the base of wikivoyage, they seemed to share my view on free content for travelers. So I contacted IB to see if we could get a copy of the data and work out a partnership with my new company. We were/are CC-by-SA and so I figured a modified version of the content would augment perfectly with the what we were building, and I felt like in the long run, because we shared licenses we could equally give back to WT. However IB only wanted to give us a 250 character stream. Not the whole data. We wanted the data because we were planning on building a much different type of wiki (think wikivoyage meets yelp meets kayak). We weren't trying to divide the wikivoyage community, we just wanted to build something completely different. My hope was to start with wikivoyage content for guides, but as Travature began to morph into a completely different animal we would continue to reupdate back into the wikivoyage community what ever they/you deemed useful from us. My original plan was to prominently give WT credit and show our partnership at every opportunity. However all my partnership plans fell apart, as IB basically said that what we were doing wasn't in their interests and so they declined giving us a dump. My perspective was it wasn't "their" data to make decisions on who should and shouldn't get access to it. Anyways, in the few months since I've kept my mouth shut and my teams quietly worked on building our system, however today we finally finished the technical backend to do dumps of our own, so I felt like I could finally in good conscience weigh in on the issue. -- (WT-en) Jeremy (talk) 02:53, 12 September 2007 (EDT)
I have a fork of Wikivoyage that we made about 2 months ago for a project that I have now decided not to persue. If anyone is interested in the project and using it to create dumps or for other creative purposes, please let me know. You can see where we started the project at: http://siteground181.com/~travelh2/wiki/index.php?title=Main_Page . We are a commercial venture, but open to any new idea with merit. Note that right now the site is locked down, but that was only because we really never did anything with it. Best (WT-en) Rbe2004 17:14, 8 November 2007 (EST)

Regarding use of a crawler to download content[edit]

swept in from the pub

I am working on a project in the field of data mining and machine learning in the travel domain, for which we require high-quality content to conduct some experiments. It would be great if we could have access to the data dumps. Alternatively, would it be fine if we crawl through a section of this website ? (WT-en) Software.research.work 08:52, 9 November 2009 (EST)

See Project:Terms of use#Spiders --(WT-en) inas 16:51, 9 November 2009 (EST)
If you want data dumps then use Special:Export. —The preceding comment was added by 121.73.78.167 (talkcontribs)


Hi there

Sorry to hear about your dump trouble :(

I have added some dumps to allow you guys to download it :). The dates are 14th June 2010 at 3:30 Unfortunatly the imagegetter tool didnt do a great job so most of the images are missing... I cant seem to find the source code anywhere for imagegetter.exe or who created it on the net - if anyone knows this info it would be greatly appreciated. You can email me on allebone@gmail.com (Peter Allebone)

Complete files: (only download if you really really want to) http://dl.dropbox.com/u/63233/Wikivoyage/Complete%20zip/WikivoyageComplete14-June-2010.7z

Just the binaries for the wiki2touch app on the iphone: http://dl.dropbox.com/u/63233/Wikivoyage/Completed%20binaries%20for%20iphone/Completed%20binaries%20for%20iphone.7z

Interum dumps and logs: (the xml file for other offline wiki readers) http://dl.dropbox.com/u/63233/Wikivoyage/Interum%20dumps%20and%20logs%20-%20the%20xml%20bzip%20file/Interum%20dumps%20and%20logs%20-%20the%20xml%20bzip%20file.7z

Source code and tools used: http://dl.dropbox.com/u/63233/Wikivoyage/Source%20Code%20and%20tools/Source%20Code%20and%20tools.7z

I hope this helps, I will try make anouther dump next month :)

Pete - 15-06-2010

HI

I added some more dumps Please go to www.allebone.org to get the dumps.

Kind Regards Pete - 22-07-2010


Thanks for this, you rock! (WT-en) Emijrp 08:39, 31 October 2010 (EDT)

WikiTeam[edit]

Hi. I'm glad to announce we did dumps for every WikiTravel language. (WT-en) Emijrp 10:11, 3 March 2012 (EST)

Dumps of English Wikivoyage?[edit]

Swept from the pub:

I want to run a breadcrumb integrity check script. So I need a fresh database dump for the English articles (at least the wikicode, images not needed). Where can I find one? Not found at [1] [2] [3]... I used to ask this question a lot on Wikitravel haha ;-) Thanks! Nicolas1981 (talk) 07:57, 16 October 2012 (CEST)

In future dumps would be available, now only the de:, it:, and shared: are available. Now we have or lot to do with the migration to WMF, and dumps are now not the first priority. --Unger (talk) 08:40, 16 October 2012 (CEST)
OK, thanks! Nicolas1981 (talk) 13:42, 17 October 2012 (CEST)

XML dumps of other languages[edit]

Swept in from the pub

[Follow-up from here] Hi, if I understood this mailing list post correctly, some users have XML dumps of some Wikitravel language editions. As it was mentioned in following posts of the mailing list, new wikivoyage subdomains can be created from the dumps if they are made avaiable (this means, new Wikivoyage projects for languages like Spanish or Portuguese, which already had content in Wikitravel. I have created incubator:Incubator:Wikivoyage import for an overview). So it would be great if someone can point me to someone who can provide these dumps :-) - I'm an importer for new wikimedia projects, so I'm trying to move this forward. --MF-Warburg (talk) 12:01, 15 November 2012 (UTC)[reply]

You could maybe try asking at de:User talk:DerFussi or de:User talk:Unger. It would be very nice to get those dumps. For the "incubator"-style pages on Wts, it might be easier to just use Special:Export and Special:Import, since there are so few pages. --Stefan2 (talk) 12:29, 15 November 2012 (UTC)[reply]
Thanks, I will ask them. — Yes, I will use Special:Export on Wts. --MF-Warburg (talk) 12:32, 15 November 2012 (UTC)[reply]

Datadump?[edit]

Swept in from the pub

It's never too early to release data dumps. Are some available? If not, would it be possible to generate some? I really need the wikitext data to publish OxygenGuide (a lean copy of WT/WV for smartphones/notebooks, generated from wikicode level), and while scraping is an option, I would really prefer a more efficient/sustainable approach. OxygenGuide does not include images or anything complex, so Wikivoyage is already perfect for it. Freshness of the datadumps was one of the major points of discontentment with Wikitravel. Nicolas1981 (talk) 08:08, 16 November 2012 (UTC)[reply]

Hello Nicolas1981, Databasedumps are automatically generated at least one time a month. The most recent dump is from 6 November. Databasedumps can be found at dumps.wikimedia.org/ and the English Wikivoyage dumps are available via dumps.wikimedia.org/enwikivoyage/. Greetings - Romaine (talk) 08:53, 16 November 2012 (UTC)[reply]
en.WV was moved to WMF on November 10; a 6 November dump is therefore empty... and useless. K7L (talk) 13:36, 16 November 2012 (UTC)[reply]
Within a month a filled dump will be available. Romaine (talk) 00:58, 17 November 2012 (UTC)[reply]
Dumps are ready :-) http://dumps.wikimedia.org/enwikivoyage/20121118/ Nicolas1981 (talk) 03:18, 19 November 2012 (UTC)[reply]

Number of bytes[edit]

Swept in from the pub

You know, I would be interested over time if the number of bytes of Wikivoyage increases or decreases. Is there any way to find that out? The question is whether we're adding content as fast as we're removing listings for places that are closed, etc. Just would be interesting. ---Selfie City (talk | contributions) 01:57, 28 August 2018 (UTC)[reply]

There is a Wikivoyage:Database dump with a few old versions going back to April:
2018-04-03 19:41:56 done All pages, current versions only. enwikivoyage-20180401-pages-meta-current.xml.bz2 117.6 MB
2018-08-22 02:23:26 done All pages, current versions only. enwikivoyage-20180820-pages-meta-current.xml.bz2 120.8 MB
The net trend does seem to be "up", but that doesn't tell us whether that is because of new articles, expansion of existing pages or both. K7L (talk) 02:26, 28 August 2018 (UTC)[reply]
Keep in mind that quality is more important than quantity as well, weeding out clutter can improve articles although it decreases the database size. 94.119.64.18 11:56, 5 September 2018 (UTC)[reply]