Wikivoyage talk:RDF Expedition

From Wikivoyage
Jump to navigation Jump to search

Purpose[edit]

I would like to strongly suggest that we partition these problems vertically rather than horizontally. That is, I'd love to see us talking about particular application domains, and how to deal with each them in the "4 steps" outlined on this page, but one page per problem. For example, we've got Project:geocoding, which deals with lat/long data from top to bottom. --(WT-en) Evan 16:07, 22 Nov 2005 (EST)

Why do you think that would be better? I suppose some problems will require a similar vocabulary. If you see each problem for itself, you probably will define redundant vocabulary that rather would contribute to confusion. An other point is that you may lose some related aspects out of sight if you only look at your very special problem. For example, concerning your Template:Geo, it might have been worth thinking about the possibility to provide information about the accuracy, as proposed in Project:RDF Expedition Collection of wanted features#Article status. Or look at your Template:IsIn. You have limited it on places. Wouldn't it also be usefull to have bread-crumb navigation for other kinds of articles, e.g. Manual of style -> article templates -> Big city article template -> Quick big city article template?
You have spread both templates very quickly in many, many pages after having shown them as an example in Project:RDF#RDF and templates. I didn't see a reason for that. Why didn't you want to wait until there had been a discussion about that? Maybe, I did overlook it. At least, you didn't refer to any discussion neither in your Project:Geocoding nor in Template talk:Geo nor in Template talk:IsIn.
I'm convinced that in this complicated matter the straight forward way may very easily result in follow up confusion an corrections that actualy are unnecessary. For sure, it's the fastest way to get some results, but not always the best ones.
Anyways, I do not insist in finding vocabulary for all and everything propsed in step 1 before opening step 3. But for now, I think it is still too early. Maybe, we (I?) could focus on some clusters of problems and open step 3 successively for certain problems.
-- (WT-en) Hansm 16:30, 22 Nov 2005 (EST)
So, firstly, we have two big pages with like 20 sections (which link to the corresponding section on another page). I'd find it easier to read and work with 20 pages with 2 sections. It's a data organization issue. To be honest, I just haven't read these pages because they're too long and intense for me. Splitting them up is going to make it easier for me to read them one at a time.
Second, I don't think this expedition will ever be "done", and I think it's more useful to set up a framework for proposing new domains of information and ways to deal with them. I think short, sharp little projects ("What about lat/long encoding? How will we add it? What vocabulary will we use? Do we want to have any other UI? OK... go!") is going to be much better than trying to deal with everything at once. I think it's entirely possible to draw from experience with other problem domains without having everything listed on one big page.
Third, a great thing about using templates is that we can change the implementation of the vocabulary "under the hood". If you think that the vocabulary for Template:Geo is wrong, please, plunge forward. It will change hundreds of pages instantly, but that's probably a good thing.
Fourth, I see less value in hierarchical navigation of non-geographic subjects, but I'm willing to re-implement. We could use another template ("is subtopic of", say) to represent conceptual rather than geographical hierarchy. The output (breadcrumb navigation) will be the same, but the semantic content will be different. Where should we talk about that?
Fifth, I'm confused about the geo accuracy information. Do you want a way to say, "the lat/long of Berlin is X/Y, but that's inaccurate to 50 meters", or do you want to say "Berlin is bounded by these points: (lat/long, lat/long, lat/long, ...)"? Or do you want to say, "Berlin is centered at X/Y, with a radius of 10 km"?
Anyways, I'd really, really like to address each of these topics separately on its own page. --(WT-en) Evan 09:35, 23 Nov 2005 (EST)
Since I didn't get convincing arguments against my basic concern, I repeat: I'm convinced that in this complicated matter the straight forward way may very easily result in follow up confusion an corrections that actualy are unnecessary. For sure, it's the fastest way to get some results, but not always the best ones.
If templates didn't get enough or inappropriate parameters, it doesn't have any use to change the templates themselves. I fear, at this basic point a difference in mentality and way of working becomes apparent. Maybe, you will get a better resonance for the expedition when doing it your way: plunge forward!
-- (WT-en) Hansm 10:21, 23 Nov 2005 (EST)

Proposal[edit]

Swept in from the Project:Travellers' pub:


RDF Expedition[edit]

Evan has implemented his Wikimedia extension that allows authors to put RDF information into every article. This is done by using the Turtle syntax, still too complicated for the average author. Thus, we will need a set of templates that hide the actual Turtle code. The use of templates should be lucid and easy for every author. I'd propose to start a RDF Expedition with the following goals:

  1. Finding possible uses or goals for RDF meta data.
  2. Collecting all kind of information that should be represented in RDF meta data in order to reach this goals.
  3. Elaborating a set of easy to use templates

If not as expedition, at least we will need some central point where a concept can be discussed. IMHO, it wouldn't make too much sense to begin the discussion from the bottom (i.e. on Template talk:IsIn). The matter is complicated enough that we need clear ideas about the general goals first. Then we can think about how to make the templates. Take Evan's Template:IsIn as an exception. He has started with the easiest, even to demonstrate his general ideas on how RDF could be used at all.

Opinions? -- (WT-en) Hansm 03:10, 18 Nov 2005 (EST)

I think it sounds great! I'll take a poke at starting the page this morning. --(WT-en) Evan 10:54, 19 Nov 2005 (EST)
Fine. I'm curious. Perhaps somebody else who'd like to join? -- hansm
I have started the expedition. Tell us what would be nice to have. -- (WT-en) Hansm 10:01, 21 Nov 2005 (EST)

Everything will have to be coded?[edit]

Am I right in my understanding that every RDF idea will eventually have to go back to Evan to get coded? I mean, I just got an idea for a Template:isAt to be used in user pages to declare the user's physical location. If we want to generate Project:Wikivoyagers by location automatically from this RDF, Evan will have to code it for us right? If so, we've hit a bottleneck as Evan is never going to have the bandwidth to do all those nifty things.. --(WT-en) Ravikiran 02:32, 31 Jan 2006 (EST)

Actually, Evan's main interest right now is doing these nifty things, so please don't worry about that.
My main problem is the format of this expedition: doing all the requirements first, then all the template and RDF design, then all the coding.
I'm going to try to split these problems up into chunks, so that they can be addressed individually. If there's overlap (say, two different problem sets can use the same RDF data), great. But I just can't deal with doing everything at once. --(WT-en) Evan 11:29, 31 Jan 2006 (EST)

Saving something[edit]

I'm archiving this, because there are important links I have to mine. --(WT-en) Evan 11:38, 31 Jan 2006 (EST)

  1. The first step would be a brain-storming like collection of all features that would be nice to have. At this state, we should not too much concern about how to build them, just enlist what you ever have dreamed about.
  2. In the second step, we have to think about the RDF information itself: How could it be represented? What is an appropriate vocabulary? What vocabulary is already defined elsewhere? What do we have to define specialy for Wikivoyage? Find good definitions. Where should the information go? Maybe many more questions.
  3. Step three. When we have found a suitable vocabulary and know what RDF information should go to what place (article, extra control page, etc.), we can build templates hopefuly in a strait forward way.
  4. Step four may be our concern later.

Step 1 and 2 might perhaps be done partly simultanously.If you propose some feature and already have an idea about how the info might be represented, tell us both. Nevertheless, we will open one page for each step.

Projects[edit]

Hey, I like this 'project' set up as a subset of the bigger RDF Expedition. I think it fits well with the wiki-style 'incremental-improvement' methodology. Our needs are going to evolve along with the content and I'll be easier to keep up with if we're a little more light on our feet... (WT-en) Majnoona 13:01, 31 Jan 2006 (EST)

Dating information, keeping current[edit]

Swept in from the Pub:

Do we have a convention for tagging information by date, wso that we know when it's getting stale? For instance, if I read that a certain bus costs USD $10 as of one month ago, I can be confident it's likely the same now. But if the information is two years old, I have less confidence == and a local editor may want to find such facts and update them.

Wikipedia has a convention of articles with the title As of (year), e.g. the Wikipedia.org As of 1990 article. From that page, you can see which articles link there, and those are candidates for updating. For instance, I can see that the Georgia (country) article has 1990 census data, and I may know of a more recent census than that.

Any such convention for Wikivoyage? Thanks! (WT-en) JimDeLaHunt 13:46, 24 Dec 2005 (EST)

Move to shared[edit]

I want to move this and all RDF related policy pages to shared. Comments welcome. — (WT-en) Ravikiran 01:51, 8 October 2006 (EDT)

IsIn failure[edit]

Swept in from the pub:

I see that Aichi has an {{isIn|Chubu}} template on the page, as it should, but the breadcrumb navigation doesn't appear at the top of the page. I tried purging the cache and it made no difference. Can anyone see the problem? --99.140.179.169 21:36, 20 June 2008 (EDT)

Hmm, as of right now the beadcrumb navigation displays without any problems. --(WT-en) Peter Talk 02:00, 21 June 2008 (EDT)
After purging the cache for a page the page will reload, but without the IsIn... if you then click on the "article" tab to reload the article, you should see it – (WT-en) cacahuate talk 16:10, 21 June 2008 (EDT)

Querying {isIn| } tags from the top down[edit]

Swept from pub:

I have noticed that at the top of each article's page, there is a expanded flow chart of links which shows which category the article falls under.

For example the article on South_Coast_(New_South_Wales) has

Oceania : Australia : New South Wales : South Coast at the top of its page.


If i understand correctly this feature is achieved by including the tag {isIn|New_South_Wales}}.


I was wondering if there was a way to query the MediaWiki API (or another method) to find out all the articles which fall under one of the categories.

For example, querying the category of Europe would give (amongst others) the pages Europe : Central Europe : Switzerland : Basel (region) : Basel and so on.


Any ideas?

Thanks. (WT-en) Sirtrebuchet 13:50, 20 March 2009 (EDT)

Short answer is "I'm not sure", but see Project:Breadcrumb navigation and Project:RDF if you want to try... (WT-en) Jpatokal 23:22, 20 March 2009 (EDT)
OK, so I have done some investigation work and I have discovered that Project:RDF is what I was looking for. However, I was wondering if there was documentation on how the interface on the Project:RDF page worked? Thanks. (WT-en) Sirtrebuchet 00:29, 23 March 2009 (EDT)
I can't see how the isPartOf RDF relationship we establish, is going to generate any inverse relationship. And we don't define a relationship between the region and the subregions. So, I can't see how it would be possible, using RDF or other means, to get the information without accessing every article page to find the region it is contained within. Once you have done that, accessing the RDF info for each page is fairly trivial. --(WT-en) Inas 01:04, 23 March 2009 (EDT)
After some deeper investigation into this, I guess plotting inverse relationships is going to be harder than I thought. What I was hoping to do was to start at a general page, say Europe, and categorise (i.e. make a list) all the articles which pertain (i.e. are linked in an upward manner) to Europe. I thought I could just see what was linked to Europe and then what was linked to that and so on, but now I realise that that would end up linking all to all pages and just give a web of links to the entire wiki. I was hoping to get just articles pertaining to that region.
So to accomplish what I am wanting to do, one would have to check each page for the link hierarchy back to the top (random example Paros -> Cyclades -> Greek Islands -> Greece -> Europe) and then make a list from that in reverse order to get Europe -> Greece -> Greek Islands -> Cyclades -> Paros and then extrapolate for all links making their way to Europe. The only problem here being that not every article has such linking information. (WT-en) Sirtrebuchet 01:40, 23 March 2009 (EDT)
Virtually all articles do have isIn/isPartOf templates, only the very stubbiest of stubs don't (and omitting them is no great loss). (WT-en) Jpatokal 01:52, 23 March 2009 (EDT)
Using the api, is there a way to check for the isIn/isPartOf tags? I had a look but there are a lot of options to choose from. Thanks. (WT-en) Sirtrebuchet 02:04, 23 March 2009 (EDT)
Both the isin and ispartof templates just define a RDF ispartof relationship. So, if you look in the RDF definition for an article, just pick out the ispart reln, and there you have it. So, for example, you get the RDF XML for Australia at [1] You then just parse the XML to get the ispartof reln. In this case the <dcterms:isPartOf rdf:resource="urn:x-wikivoyage:en:Oceania"/> line. --(WT-en) Inas 08:12, 23 March 2009 (EDT)
Thanks for pointing that out. One could also look at the page source and seek out the line which begins with <div id="contentSub"> which is the html of the IsIn tag and it includes the complete IsIn hierarchy right back to the top which could be parsed. For example, a parsing of this line of html from the Australia article could give Oceania, Australia. (WT-en) Sirtrebuchet 17:22, 23 March 2009 (EDT)

Would it be possible to attach some RDF code to the Regionlist template? Not all region articles use it, but I think it should be a goal to have them do so (if they have subregions). (WT-en) LtPowers 08:32, 23 March 2009 (EDT)

Adding the RDF relation to the regionlist template is very straightforward. Making sure a significant percentage of the destination articles would be captured within the hierarchy is more problematic. Although this is done well at a high level, I suspect we would miss many articles using this method. It may be useful for other reasons though. It could give us a nice progress indicator for developing the regional hierarchy.
On a related issue, if still is difficult to tell if an article is a destination article, as opposed to a travel topic, or itinerary from the RDF. There is no RDF reln to identify itineraries or travel topics, and there is only RDF for cities, countries, etc in newer, more specific templates. This may frustrate what (WT-en) Sirtrebuchet is trying to do. Are there may itineraries or travel topics that use the isin|ispartof template? --(WT-en) Inas 18:26, 23 March 2009 (EDT)
They should all use Template:Related; the breadcrumb navigation is supposed to be purely for the geographical hierarchy of destinations. --(WT-en) Peter Talk 18:42, 23 March 2009 (EDT)

RDF/IsPartOf, are we serious, do we need it on districts?[edit]

There is an issue currently in the pub, and one Chicago districts whether we need to have isPartOf RDF reln's defined on districts. From the point of view of breadcrumb navigation we don't - the navigation works okay from the article name syntax. From the point of view of humans we don't, it is apparent that the district is a subregion from the district name. However, if we want to support the RDF reln, then we do, because there is no ispartof RDF reln defined by anything else other than the ispartof template. So, the question is - do we care about the RDF anymore? Or has this expedition failed, and we only care about the breadcrumb navigation? --(WT-en) inas 19:27, 23 June 2009 (EDT)

Given that noone seems to have a strong opinion on the RDF stuff, I'll just add the isPartOf reln to the districtguide template, and this will formalise that isPartOf the template is not required on districts. The isPartOf RDF information will be there if anyone ever wants to make use of it. If someone decides one day that the district articles do need a IsPartOf template, they can simply remove the RDF from the districtguide template. If anyone has any issues, express them, or I'll proceed to update the template, and update the breadcrumb navigation stuff to say there is no need for using the template on districts. --(WT-en) inas 01:45, 10 July 2009 (EDT)
One drawback is that {districtguide} tends to get left out in cases where the district name isn't meaningful by itself – for example, Kyoto/South doesn't use that template, because then it would turn the page's title into just "South travel guide". The article does have the proper breadcrumb navigation, which I had assumed was RDF machinery behind the scenes – and if not, perhaps it should be? Pages in the main userspace with slashes in the title are (or at least should!) always be districts, making them predictable and presumably automatable. - (WT-en) Dguillaime 02:07, 10 July 2009 (EDT)
Can you explain what you mean? As far as I can tell, all the districtguide template does is use RDF to make it as a district - which is the correct thing to do, because all our articles in theory should be RDF marked as either cities, countries, or districts, and so on. If adding a districtguide breaks something, it should be fixed. I don't agree that having a slash always automatically should add an ispartof RDF reln automatically. I can see a possibility one day that we could use subpages that don't necessarily imply a geographical container, so hardcoding it seems wrong, somehow. And, yes, that means the breadcrumb navigation is also done wrong, but I don't fancy my chances with a tech request. --(WT-en) inas 02:29, 10 July 2009 (EDT)
I've heard stories of a tech team – rumors, perhaps, or the mythology of a bygone era.
RDF is not a language I know, but the difference can be demonstrated with another example: Kyoto/South, which does not use the template, has a HTML page title of "Kyoto/South"; but Kyoto/North does use {districtguide}, and its title becomes "North travel guide". This behavior is sometimes desirable, but only if the district name has an independent meaning (as an example in the same vein, Kyoto/Arashiyama). Perhaps a feature that could be spun off into a separate template? - (WT-en) Dguillaime 02:52, 10 July 2009 (EDT)
Well, I guess we have already have a different template to set isPartOf relationships, which is (of course) isPartOf. I started down this trail when I tried adding isPartOf templates that were missing to a couple of districts, to make sure that the isPartOf RDF was correct. However, they were removed as unnecessary on districts, which for breadcrumb navigation appears to be true. It does seem wrong that we should not be able to use the districtguide template on a district though, If we can't use the template, then it sort of makes it pointless having any of the cityguide, countryguide templates if they are going to be purposefully omitted from some articles, and you won't be able to use RDF to indicate an article type. I'll try and play around and figure out exactly what is going on. Thanks --(WT-en) inas 20:05, 12 July 2009 (EDT)
It seems to happen when you add any RDF to a page, it shortens the heading. It doesn't matter whether you add an isPartOf tag, or if you add a district tag. Just defining the page in RDF appears to be sufficient to make it happen. I suspect the logic in the title display is saying that if the pagename appears to be a URL, then truncate from the last slash to the beginning of the string. Still, it isn't reasonable to not add any templates to district articles just to avoid this consequence. If we see it as a bug, it should be filed with the tech team. I'll hack around a bit more to see if there is a workaround. --(WT-en) inas 20:25, 12 July 2009 (EDT)

It is clearly a bug. I've logged it as a tech request (for what it is worth). I understand that it is nice to have the full district name in the page title - sometimes - but to do that currently means that we can have no RDF on the page at all, and we really should give up on RDF if we say that there are a subset of pages we are not going to permit RDF to be used on. It becomes useless.

So, assuming we are not about to give up entirely on RDF, we come back to the original choice.

Is it preferable to put the RDF code into the districtguide template, or to put ispartof templates on district guides. --(WT-en) inas 02:21, 16 September 2009 (EDT)

I'm not sure I've quite grasped the implications of the proposed change. Correct me if I'm wrong: the change would fix the problem of no RDF identifying district articles, but would alter the display of district names (e.g., Chicago/Loop → Loop). Am I missing anything? --(WT-en) Peter Talk 21:56, 17 September 2009 (EDT)
Okay, a summary. RDF is currently used for breadcrumb navigation in destination guides. The RDF isPartOf is embedded in the isPartOf (and isIn) template.
However, in District articles (subpages), the RDF isn't used for breadcrumb navigation. The modified wiki software just makes assumes that subpages are contained within their parent pages, and doesn't use any RDF at all. Adding an isPartOf template to a district has no effect on breadcrumb navigation. It is just ignored.
This has led to the situation where we use RDF for breadcrumb navigation in destination guides, but we don't generally use it in district guides.
However, we have RDF implemented to make the meta-information easily readable to computers. So, for example, if you wanted to write a quick program to calculate a geographical hierarchy to identify all articles in Europe, you couldn't just use the RDF isPartOf relation, because it isn't in most of the district guides - they would be missed.
So, a couple of options
Firstly, we could forget RDF altogether - it is obviously not being used much (at all?) at the moment, so perhaps we just ignore it and don't worry about it. People who want to parse articles automatically can just try and parse the web page directly without RDF. If we choose this path, we should consider closing off or suspending the expedition for now.
Secondly, we could incorporate the RDF into the districtguide template. That way we wouldn't need to add isPartOf to district guides directly.
Thirdly, we could add isPartOf templates to district guides.
We need to be aware, that there is a bug in the wikivoyage mods, which mean that when you add RDF - any RDF - to a subpage, the title of the subpage (which appears in the titlebar of the web browser), changes from the full page name, to just the subpage name. This is fine for a district like Chicago/Loop, because the page title just becomes "Loop". it is a bit more of a hassle for Kyoto/South because the page title just becomes "South". To me it is unclear which behaviour is correct. Should the page title be the subpage name or the full name? In any event, it is a bug that it changes just because you make a completely unrelated change to the page.
The bug report is on wts:WtTech:RDF_embedded_in_page_alters_page_title_on_subpages --(WT-en) inas 22:45, 17 September 2009 (EDT)
That's not a bug. It is actually intended behaviour. Unfortunately, it often leads to unintended consequences. Also, it has nothing to do with the district page problem. For example: South (United States of America) displays the title as "South travel guide" which is meaningless for someone who doesn't instinctively associate "South" with the US South. The cleanest solution, IMHO, is to have the isPartOf RDF on all pages, because as you point out, otherwise it will be annoying for anyone who wants to build a geographical hierarchy. We should also have another tag that enables us to override the title. Sadly, the second part will require technical assistance, which we are unlikely to get. — (WT-en) Ravikiran 01:05, 18 September 2009 (EDT)
It may not be a bug to use the subpage name as the page title. It may not be a bug to use South as the page title for South (United States of America). What is a bug, is for the behaviour to vary depending on whether there is any RDF on the page. If there is no RDF, the full name displays as the title, if there is some RDF, then only the shortened title displays. Either behaviour may be considered correct or intended, but the variation of the behaviour depending on whether or not you use RDF is most certainly a bug. Have a look at Kyoto/South, the full name is in the page title now, i.e Kyoto/South. Add any RDF to the page, you can use the isPartOf template, but any RDF will do, and the page title will change to just South. That is a bug. --(WT-en) inas 01:22, 18 September 2009 (EDT)
And I just tried the same thing with South (United States of America). If you remove the isPartOf template at the bottom, the page title displays as the full name, and not the abbreviated version. It seems we can have the full page titles, or RDF, but not both. --(WT-en) inas 01:26, 18 September 2009 (EDT)

I've updated the documentation on the template page. --(WT-en) inas 21:40, 8 November 2009 (EST)