Wikivoyage talk:Copyright-related issues

From Wikivoyage
Latest comment: 5 months ago by MercifulCarriage in topic AI-"generated" edits
Jump to navigation Jump to search

This is spam, please revert it

[edit]

A spam bot likes to spam the first section of this article, so this section has been added as a way of notifying editors that a spam contribution was made. -- (WT-en) Ryan 16:49, 17 April 2006 (EDT)

[edit]

I'm translating the Copyleft page into German. For me, a big juristical question has risen: What law is aplicable when a German contributor transmitts his work onto an US-American Server, German or US law? This not unimportant since there also exist German and Austrian adaptions of the Attribution-ShareAlike 1.0 Licence. To where should I link now??? -- (WT-en) Hansm 09:05, 2004 Sep 13 (EDT)

Edirectory.com

[edit]

Swept in from the Project:Travelers' pub:

There is a lot of material from edirectory.com http://en.wikivoyage.org/w/index.php?title=Special:Contributions&target=210.214.89.130 As there is also a lot of references to edirectory (against ext guide policy) the anon user just might be the copyrigtholder, but I doubt it. Also he renames Learn to Money, etc. --(WT-en) elgaard 10:19, 4 Jul 2005 (EDT)

That's a clear-cut case for reversion. He has to actually say he's the copyright holder in order to do this. It could always be recovered if he says he's the copyright holder. -- (WT-en) Colin 13:20, 4 Jul 2005 (EDT)

See also

[edit]

In case someone thinks removing the following relevant links is helpful, please pause to think. Research into what is actually legal for us to copy is necessary. The following are valuable research leads and references. Please do not remove them from this Talk page.

  • Wikivoyage uses a Creative Commons copyright while Wikipedia uses a Gnu Free Documentation License. Nevertheless, much relevant and well presented research about copyrights may be found on Wikipedia's copyright and copyright discussion page: http://en.wikipedia.org/wiki/Wikipedia:Copyrights
  • Universal Copyright Convention as revised at Paris on 24 July 1971:
Per section http://www.unesco.org/culture/laws/copyright/html_eng/page1.shtml
One whole (searchable) page http://palimpsest.stanford.edu/bytopic/intprop/pariscnv.html

Upload

[edit]
Swept in from the pub

The upload process for Wikivoyage has licensing options for Commons 3.0, etc. but not 4.0. This should be added so giving appropriate credit is possible. Thanks. --Comment by Selfie City (talk | contributions) 01:09, 15 February 2019 (UTC)Reply

I suppose the list just hasn't been updated, so I am adding the licence now. Could somebody who does local uploads check the change works as it should?
I do not understand the "so giving appropriate credit is possible" part. If you upload images taken by others, then there is a zillion of possible licences. Is there something in the CC-BY-SA 3.0 that hinders giving "appropriate credit"?
If there are technical issues, the talk page is MediaWiki talk:Licenses.
--LPfi (talk) 07:30, 15 February 2019 (UTC)Reply
I left the old version. I suppose the selection of licences is small by design, so having two versions of the same licence could be regarded redundant. On the other hand CC-BY-SA 3.0 is the licence used for text on WV, WP etc., and I for one have not studied the changes introduced in the new version and am thus not confident enough to use it for general licensing. --LPfi (talk) 07:35, 15 February 2019 (UTC)Reply
Thanks for adding CC4.0. The selection now implies that CC2.0 is only for Flickr images when it could be used for those based on older commons files (maybe it has always done this). It is missing a public domain licence, although if an image is public domain, then you are free to upload it using one of the CC licences. I don't think that we have much use for the more specialised licenses that commons allows. It would be good to have a page which explain licenses linked from the upload page. AlasdairW (talk) 14:25, 15 February 2019 (UTC)Reply
I just added the 4.0 ones at "Creative Commons License". I supposed the Flickr licences were Flickr specific. Having the same licences twice (ones for licence source, once for image source) makes little sense, so I merge the lists.
Licensing a PD image with a CC licence is copyright fraud. While it seems not to be criminal in USA, I think we should not do something that seems like encouraging the practice. The questions is whether {{PD}} is enough or whether we should offer several PD tags.
The question about "specialised licenses" is whether we want to support upload of images which are under some other licence. As long as the uploader is the copyright owner, it is easy to require the use of one of a small set of standard licences, but if we want to use a photo that is licensed under any other licence, we have to either do without or accept that licence, however "specialised" it is. I think "Something else" has to be offered for those cases. I will add that, and corresponding language to MediaWiki:Uploadtext.
--LPfi (talk) 14:54, 15 February 2019 (UTC)Reply
The reason is that, per "appropriate credit", the box with the information about the license is necessary. --Comment by Selfie City (talk | contributions) 15:24, 15 February 2019 (UTC)Reply
Mentioning the licence is an additional requirement to that of giving appropriate credit, a requirement of many more licences than those in the drop-down. The other licences have to be handled by inserting the template by hand, or writing name and link by hand for those licences lacking a template. But as CC-BY-SA 4.0 has become the recommended licence on Commons, it is good that we offer an easy way to use it. --LPfi (talk) 16:46, 15 February 2019 (UTC)Reply
OK. I'll try the "Upload file" now to make sure it works, but the code that I saw looked good. --Comment by Selfie City (talk | contributions) 23:39, 15 February 2019 (UTC)Reply

GNU

[edit]

Is it okay to use pictures using the GNU license for banners? I've never really paid attention to it before, because beneath it in Commons are always Creative Commons licenses, but just checking. --Comment by Selfie City (talk | contributions) 01:16, 16 February 2019 (UTC)Reply

As I understand it the GFDL is inconvenient for reusers and therefore generally not allowed as the sole license for new uploads to Commons (see here). This doesn't really matter for us directly. The only reason for us to avoid the license, as far as I can tell, would be to make it easier for people to meticulously follow the licensing requirements when reusing our content in offline/printed materials. Files that are dual-licensed under GFDL and CC are absolutely fine, we (or other reusers) can just use the CC license and ignore the GFDL. —Granger (talk · contribs) 02:14, 16 February 2019 (UTC)Reply
I see. In my cases, I believe the files had both GNU and Commons licenses listed, so when uploading the cropped version of the file, I cited the Commons license. Thanks for explaining! --Comment by Selfie City (talk | contributions) 02:40, 16 February 2019 (UTC)Reply
Oh, and thanks for making all those changes to "North Macedonia". --Comment by Selfie City (talk | contributions) 02:40, 16 February 2019 (UTC)Reply

Articles vs GPT

[edit]
Swept in from the pub

I'm sure most of you guys are aware of GPT by now. GPT-4 (also the one on bing.com) can now summarize articles, and even do stuff like https://twitter.com/rowancheung/status/1655235900899573763?s=20 ("Find hidden gem travel spots in Maui, Hawaii") ... Isn't it time to rethink our approach to putting together region articles? Obviously it's chicken-egg problem, but still I'd say in the coming months/years it will progressively be less and less valuable spending time on stuff that can be generated (esp. since GPT will have some statisticical knowledge about the the regions, regarding the most cited POIs). Is there some "will" here to discuss major changes, or we'll do it the "good old way", until WV stops being relevant? I'd say city-level articles are mostly safe for now, but the higher level ones.......... -- andree 17:50, 7 May 2023 (UTC)Reply

Chat GPT has built-in problems with plagiarism, and also, when it's inaccurate, if we use it, who is at fault? We are not going to outsource anything to them. Ikan Kekek (talk) 17:54, 7 May 2023 (UTC)Reply
Obviously, changing WV to a GPT proxy wasn't the suggestion... Sure, in principle all the AI stuff is either hallucination or plagiarism, by definition. That doesn't change the fact that the way we access/search for (any) information will likely change dramatically. "Morality" issues aside (imho, outcome like "napster-spotify" will come out of this, after some time of trying to fight this by regulations), I'd say we should brace ourselves and consider the options.
I'm but a minor WV editor, but I'm always quite surprised how many "most interesting" places are missing compared to various IG/FB travel tips channels. So that's what I mostly added in the past years. But finding the right place/city is often hard. And finding those "gems" via breadcrumbs navigation is downright impossible. I'd like to hear some suggestions how to deal with that sort of problems. How to make WV actually useful on it's own. IMO the region articles are in very poor state globally (sans the country and maybe 1-2 levels below). Is the plan that WV will be only a source of the articles forever, and bing/gpt/google/... will do the aggregation? -- andree 19:21, 7 May 2023 (UTC)Reply
The obvious solution is that the most interesting places in each region need to be mentioned in the region articles. We should do a Collaboration of the month on making sure the most interesting places are mentioned in region articles, with a link to the local articles that have fuller listings of them (meaning that adding some listings in local articles will be part of the collaboration, too). And since I just finished grading for the semester, I'd be happy to take part in it. Let's decide on a good scope of work and phrasing for the Cotm, seek participants and get started with it. Lots of region articles have sucked forever. Ikan Kekek (talk) 19:49, 7 May 2023 (UTC)Reply
Proposed at Wikivoyage talk:Collaboration of the month#New Cotm to mention places of interest in region articles?. Ikan Kekek (talk) 21:38, 7 May 2023 (UTC)Reply
Is it possible to use ChatGPT or other AI to produce region summaries, using only the referenced city articles (and sub-regions) as sources? This could then be compared with what we have in the region article. AlasdairW (talk) 22:59, 7 May 2023 (UTC)Reply
You can try and find out, but I don't see why it's necessary. Ikan Kekek (talk) 23:21, 7 May 2023 (UTC)Reply
The use of AI for article creation is going to be one of the sub-themes for WikiConference North America submissions. OhanaUnitedTalk page 03:54, 8 May 2023 (UTC)Reply
Something like this would be my idea too. But just purely with our articles, it's not possible to figure out (neither for a human, nor AI) which POIs are the most interesting/popular/visited. Something like "1) find all POIs in the articles of the region, 2) score the POIs (e.g. by google hits, or via GPT "what are the most interesting out of these") 3) progressively fill the parent region structures with the POIs acc. to score" would be ideal. GPT could help here to summarize/reword/shorten the POIs, saving us time (at the cost of using external sources of undisclosed origin and license). I think doing all this by hand is not sustainable, unless we want to emulate monks rewriting books... But I don't have the answer how to do this, I just suggest we start some discussion on the main WV goals (I reckon WP is already aware of the paradigm shift) - what the people around there love to do the most, and think how we could do the rest more easily/optimally. IMO the lowest-level articles is where it's at, and obviously some country/top-level-region summaries. But the mechanical maintenance of regions, not so much...
TBH, as an example of the progress, I also think the languages of WV may get quite obsolete. Asking GPT to 'take ":wv:it:Roma", transform the templates and translate to English, produce a valid ":wv:en:Rome" article' will IMHO be possible within months. Perhaps it will be able even to give sources of the copied-in texts. It's the kind of tasks that GPT seems to excel at, and that instead takes us loooong time and has no added value. Do we want to use this kind of technology, or keep the old ways? A sincere question - maybe people here mostly like the current ways, which is okay... -- andree 07:21, 8 May 2023 (UTC)Reply
One of my recent concerns is the sheer capabilities of ChatGPT to provide accurate travel information within sections, which could possibly drive away readership (which isn't what this is about). As Ikan Kekek mentioned above, though, the best way to start is to not have empty region articles with absolutely nothing outside the cities/ODs list. SHB2000 (talk | contribs | meta) 06:33, 8 May 2023 (UTC)Reply
Who owns Chat GPT generated content?
“Absent human creative input, a work is not entitled to copyright protection. As a result, the U.S. Copyright Office will not register a work that was created by an autonomous artificial intelligence tool.”
Should WV have a "No Chat GPT" policy? Should there be a Chat GPT template that says "This article contains content generated by Chat GPT. View the page revision history for a list of the authors."
FWIW I see Chat GPT like like dynamic maps. Sure the static ones can be better, but the dynamic ones are much better than nothing. ButteBag (talk) 16:49, 8 May 2023 (UTC)Reply
The English Wikipedia has been talking about this for months, and I believe that they have come to the conclusion that there is no way to reliably identify whether a bit of text comes from ChatGPT or similar software. The original editor might choose to self-disclose, but if they don't, we'll never know. WhatamIdoing (talk) 19:20, 8 May 2023 (UTC)Reply
ah, ok, guess it's a non-issue then. Thanks! ButteBag (talk) 20:06, 8 May 2023 (UTC)Reply
If we can't enforce a ban, that doesn't make it a non-issue, because of the copyright implications you mentioned above. I think we should have a policy of opposing edits by chatbots. I think we have to at least do due diligence by putting people on notice, to avoid potentially being legally responsible for copyright violation as a site. How has en.wikipedia been dealing with that question? Ikan Kekek (talk) 21:13, 8 May 2023 (UTC)Reply
Agree we should oppose chatbot edits. Maybe could you please elaborate on what we're putting people on notice for? My simple and limited understanding is that WV content contributors can use text generated by chat gpt in their edits. The page I linked above seems to indicate it's not a copyvio. I have not used Chat GPT in any of my edits so far fwiw, but I would consider it if I found it helpful. ButteBag (talk) 23:14, 8 May 2023 (UTC)Reply
When you say "chatbot", do you mean a human manually typing something into a website, reading the output from that website, and then copying the results into a Wikivoyage page? Or do you mean actual Wikivoyage:Bots, which edit autonomously without a human looking over the results first? WhatamIdoing (talk) 15:24, 9 May 2023 (UTC)Reply
Speaking for myself, "chatbot" means an autonomous bot adding chat gpt text to WV without human intervention. I think it's ok for a human to use chat gpt to generate text and paste it in, although obviously they should check for errors first and I'd much prefer a knowledgeable human editor instead. Maybe this nuance already exists on a policy page somewhere? ButteBag (talk) 19:29, 9 May 2023 (UTC)Reply
How about this? "Because of the risks of copyright violation and inaccuracy from chatbots, Wikivoyage editors are not permitted to add text produced by a chatbot unless they have carefully checked it for copyright violation and inaccuracy and made any needed edits before inserting it into Wikivoyage articles. Furthermore, should their edits introduce copyright violations, users make such edits at their own sole risk in case of a copyright lawsuit by an aggrieved party and may be banned from editing if such violations are discovered." Ikan Kekek (talk) 20:33, 9 May 2023 (UTC)Reply
Otherwise I agree, but how can you carefully check there are no copyright violations? The plagiarism programs used at universities? I would rather advice rewording the facts as if they were fetched directly from a copyrighted text. –LPfi (talk) 20:43, 9 May 2023 (UTC)Reply
Using GPTZero. SHB2000 (talk | contribs | meta) 21:31, 9 May 2023 (UTC)Reply
So should we use this text? "Because of the risks of copyright violation and inaccuracy from chatbots, editors are not permitted to add text produced by a chatbot unless they reword it completely before inserting it into Wikivoyage articles - the same standard that applies to the use of information from sources that are 'copyright, all rights reserved'. Furthermore, if users' edits introduce copyright violations, they make such edits at their own sole risk in case of a copyright lawsuit by an aggrieved party and may be banned from editing if such violations are discovered." Ikan Kekek (talk) 21:41, 9 May 2023 (UTC)Reply
I'd steal much of the 3rd paragraph from Wikipedia:Large language models. The term LLM also seems more precise to me than chatbot. It's tough to enforce a 100% complete rewording always. Maybe the output is different enough in some cases, who knows. I tried the gptzero link above and it misidentified some of the generated text I pasted into it.
"LLM-generated content is often an outright fabrication or biased. LLMs must not be used in areas where the editor does not have substantial familiarity. Output text must be rigorously scrutinized for factual errors and adherence to copyright policies. Editors are fully responsible for their LLM-assisted edits. If you are not fully aware of the risks, do not edit with the assistance of these tools. Furthermore, if users' edits introduce copyright violations, they make such edits at their own sole risk in case of a copyright lawsuit by an aggrieved party and may be banned from editing if such violations are discovered." ButteBag (talk) 00:48, 10 May 2023 (UTC)Reply
Just a quick note: Bias is a Wikipedia issue, not a Wikivoyage issue, as long as it's fair. Ikan Kekek (talk) 01:06, 10 May 2023 (UTC)Reply
If the user needs to thoroughly verify the accuracy of everything (which is important because GPT's output is so full of inaccuracies) and rewrite everything (to avoid hard-to-check-for copyright violations), then what's the benefit of getting the material from the LLM? It would be simpler not to allow text from GPT at all. That would also be safer – I worry that if editors start copying GPT output into Wikivoyage, they're unlikely to check the accuracy as thoroughly as necessary, and in my experience ChatGPT has an uncanny knack for writing things that sound true but are completely made up.
The best use case I can think of would be to use GPT to generate a bullet-point list of attractions, which the editor could then write original descriptions about. But in this case both the information and the wording would come from the editor; GPT would only be helping to brainstorm, so to speak. —Granger (talk · contribs) 03:13, 10 May 2023 (UTC)Reply
Editors may be tempted to use such text, so highlighting the problems is better than just blankly forbid their use (which may be seen as pure prejudice). I agree that brainstorming help is the best approach. Usually official links to the POIs found should be included in listings, and those links can be used to check facts. The AI's role would be to list potentially interesting places and provide links or search terms for finding them. LPfi (talk) 08:06, 10 May 2023 (UTC)Reply
But this isn't exactly right. Copyright-wise, the stuff GPT creates is in principle no different from you learning about a city from a few travel books, and writing it in your words. It doesn't usually copy-paste blocks of text (unless you ask it to). Rewording such non-directly-copied text gives you the same thing, just with an extra step. Not to mention, it's almost impossible to distinguish GPT-generated text (sans the obvious mannerisms it has at the moment, which will surely be gone in a few months). Perhaps it'll really be better to wait until WP makes some "official rules" and discuss those?
But the idea behind this topic was more about the general "heading" of the project. As I said, lots of the work we are doing now may soon become useless - e.g. due to change in the way people search for stuff. So we could try to "think out of the box" and suggest stuff that looks like could be (easily) automated + would move the page forward, and start brainstorming those? -- andree 20:22, 10 May 2023 (UTC)Reply
I understand that the copyvio concerns are about whether the copyright is owned by the human who uses the LLM and posts it on wiki vs the people who created the LLM. Nobody seems to think that traditional copyvios (e.g., the LLM stealing paragraphs from other websites) are likely to be a concern. While I've not seen any authoritative answers to that question, the general sense seems to be that the output of an LLM is not eligible for copyright in the US (=is automatically and inherently in the public domain, which is fine with us). Consequently, we probably don't need to warn people against copyvios in this context.
About the "check for accuracy" idea, I think that two things would be useful:
  • You are responsible for whatever you post, regardless of whether you wrote it yourself, ran it through a grammar checker, transformed it through machine translations, or used other tools to generate the content.
  • Any bot or other software that edits pages directly must comply with the Wikivoyage:Script policy.
In other words, the rules a fundamentally the same as they've always been. WhatamIdoing (talk) 16:23, 12 May 2023 (UTC)Reply
I assume there are different AI engines that behave in different ways. If they do not quote passages from text they have encountered, then I believe there to be no copyright problems with current legislation. But anyway, yes, the user is responsible. The problem is that the temptation to trust an AI bot may be there, and you can generate text much faster with that method than by copying pieces of random reviews. –LPfi (talk) 16:36, 12 May 2023 (UTC)Reply
We could update the script policy to address the speed of editing more explicitly. It currently mentions the "pace" of editing in the lead, and later establishes a rule that "Bots without botbit should make only one edit per minute to prevent flooding the recent changes." We could expand that to include content generated through LLMs or other automated or semi-automated means. WhatamIdoing (talk) 14:52, 13 May 2023 (UTC)Reply
Here I didn't think of any speed close to the bot thresholds. Rather, using an AI bot you could upgrade a section a minute instead of each fifteen minutes with manual fact gathering. LPfi (talk) 06:29, 16 May 2023 (UTC)Reply
Two thoughts:
  • We could change that from "one edit per minute" to "one edit per minute for simple, repetitive changes, such as fixing a typo and one edit per five minutes for content creation".
  • Even if it takes 15 minutes to write something by hand, it doesn't take 15 minutes to review it. I don't remember the last time I needed to spend more than 60 seconds looking at a diff before deciding whether it needed to be reverted.
WhatamIdoing (talk) 15:23, 16 May 2023 (UTC)Reply
Hello everyone, I am an editor of Wikiviajes in Spanish. On this topic, I would like to complement a little. A few weeks ago I made comments on the Spanish Wikipedia about this, and in one of the points I discussed, the artificial intelligence gave me wrong information about a destination, I transcribe:
"Here is a list of 10 things you can do in Palmera City (I used a small city in my country but for privacy reasons, I do not disclose it.).
  • Visit the museum of death.
  • Take a beer tour.
  • Visit the ecological park.
  • Eat at a local restaurant.
  • Watch a movie at the cinema.
  • Shopping at the market.
  • Visit the water park.
  • Visit the zoo.
  • Take a trip to the canyon."

The thing is that in that city there is no such museum, no zoo and much less the water park, the nearest canyon is on the other side of the state a few thousand miles away. Apparently the source where he got it from does exist and it is a tourist website -ironic giving wrong information to the traveler...-. Diff

Greetings. --Hispano76 (talk) 01:34, 26 May 2023 (UTC)Reply

I assume the AI can do such things by itself: what points of interest do you usually suggest when somebody is asking? Often you do suggest the zoo and a water park. The AI just doesn't have any idea that they aren't relevant for this destination. Of course, as the AIs develop, they will get a better "understanding" on what is required for those to be suggested (such as that they should exist), but similar errors are to be assumed to happen in the foreseeable future. That's why these attractions should be linked to the official website, where the location and other details can be checked. If the AI cannot provide it, it cannot be trusted –LPfi (talk) 08:48, 26 May 2023 (UTC)Reply
ChatGPT also thought it was possible to travel from FI to DK via AX (also allowing time to explore all three countries) in one day before correcting itself yesterday. SHB2000 (talk | contribs | meta) 08:57, 26 May 2023 (UTC)Reply

AI and Wikivoyage

[edit]
Swept in from the pub

Food for thought. I asked ChatGPT to "write a wikivoyage article about la toussuire" (wikipedia:La Toussuire. (This is a French tourist resort missing from WV that a student of mine would like to write about; I've used it as a test case of how useful this would be for us - and for my class). https://chat.openai.com/share/bbbdbd98-da67-4fd9-8251-7f0ed2006415 Obviously, I cannot prevent my students from using this tool, and we need to be aware that some will do so - and not just studens, there will be (new?) editors who will try to help like this (or "help"). And I think it is a good learning opportunity for my class, particularly when it comes to stuff like "how to get started with your article", and a reminder to double check all information / verify all factual claims. But as for WV at large, this does suggest AI could be of use to us by generating entries - as long as someone would check them, of course...? Anyway, if anyone would like to criticize the AI entry and be very specific about mistakes it made as relates to WV's manual of style or other policies, do let me know - I'd like to show this example to students in few days. Piotrus (talk) 09:03, 20 September 2023 (UTC)Reply

FTR, the previous discussion about this issue can be found on Wikivoyage talk:Copyright-related issues#Articles vs GPT. --SHB2000 (talk | contribs | meta) 11:12, 20 September 2023 (UTC)Reply
I did not look at the accuracy or style (or the copyright questions raised in the other thread) of the ChatGPT generated article. Facts can be double-checked, and style can be fixed (no different from new pages that need some help, or old pages that have fallen out of date). I don't want to open a new can of worms, but you can even ask ChatGPT in your prompt or in a followup question to use wikivoyage formatting and templates, and then ask for the full markup ready to paste into a new article.
My concern with using AI to generate content is that it contravenes the philosophy of the whole guide. Consider the following:
"Wikivoyagers are travel writers and members of a world-wide community of contributors to Wikivoyage. [...] We are people just like you. Some of us are interested in travelling, some are interested in their local communities, and others are interested in wiki-housekeeping and organisation. What all of us have in common is that we want to share what we know with travellers everywhere. (Wikivoyage:About#Wikivoyagers)(emphasis added)
The value of our travel guide is that it is the product of a community effort to share what we collectively know about places for the sake of travel. It's great not just because it can be up to date and accurate, but there is something human about deciding to share information, and there is something equally human about acting on shared information. Especially in travel.
"Whenever travellers meet each other on the road, they swap info about the places they came from and ask questions about places they're going.(Wikivoyage:About)
Maybe I just enjoy coming home with a good story, but I'll always opt for the suggestion of a fellow traveller in a new place over something that happens to be the top result in any search engine. Presenting AI travel "advice" alongside what Wikivoyagers have written and revised over many years does a disservice to the values that make WV unique. Gregsmi11 (talk) 17:22, 20 September 2023 (UTC)Reply
@Gregsmi11 Fair point, but it does take a "fellow traveller" to make the AI do something. Is it fair to deny them this tool if they can use it responsibly?
That said, I tried the tool again and I think we need to be careful - something I'll stress to my students. For example, a while ago I wrote a guide to Chorzów. I've now asked the AI to do the same. Some of the content I see AI generated seems usable at first glance, but it is certainly important to double check everything. For example, it generated an entry for "Saint Jadwiga church" that contains an error and then some pointless generalities: "This stunning Neo-Gothic church is a notable architectural landmark. Its intricate design and beautiful stained glass windows are worth admiring.". The church is not neo-gothic but Neoromanesque.... It also invented a listing for a non-existent "Stary Browar Chorzów" and likewise for some restaurants that don't seem to exist - seems it started hallucinating. It's sleep section contains an entry on a real hotel followed by a fake one and some gibberish about appartments.
The amount of hallucinations I see gives me a pause, given Wikivoyage does not require references. Piotrus (talk) 10:04, 21 September 2023 (UTC)Reply
It's not fair to deny any tool used responsibly, but I don't think we know what responsible use of AI looks like yet (or at least we haven't written a policy about it). Formatting listings? Brainstorming attractions? Translating and copyediting? Creating basic outline articles where we have gaps in coverage? We've got the hammer, now we just need to invent some nails before we start swinging.
I think ChatGPT is especially risky because it can be very good at making hallucinations sound reliable. Pointless generalities tend to be edited out eventually, but small factual errors can live a very long time if every editor assumes they know less than the original contributor. It would only take a handful of well-meaning users with blind faith in AI before most of the work here turns into factchecking. And thats probably the best case scenario... just wait until an enterprising editor sets up an AI to edit directly. Gregsmi11 (talk) 14:07, 21 September 2023 (UTC)Reply
I don't think ChatGPT has access to the internet. (Maybe I'm wrong?) It might be handy if you could say "Give me the URLs for five free activities in <city>", followed by "Okay, https://www.example.com is my favorite of the activities. Now give me a 100-word summary of the activity described on that website." WhatamIdoing (talk) 16:28, 21 September 2023 (UTC)Reply
Isn't there a risk of copyright violation from AI, too? People have to be responsible for their own words. Ikan Kekek (talk) 18:00, 21 September 2023 (UTC)Reply
Depending on the AI. If the wordings are derived from a big mass of text, none quoted verbatim, there is no copyright involved. The copyright laws may change because of the "write/draw like NN" problem, but that shouldn't be an issue for us. On the other hand, an AI can reuse phrases, if trained to do that, and if those are long or original enough, the result will be copyvios. –LPfi (talk) 19:58, 21 September 2023 (UTC)Reply
I think we'd go crazy trying to find the original source of text once the AI has rewritten and paraphrased it anyway. There's also the less legal, more ethical question of whether it is right for someone to claim authorship over text they retrieved from an AI. The user might be OK from a licence standpoint, but can they say they "wrote" something? Gregsmi11 (talk) 20:23, 21 September 2023 (UTC)Reply
Good point. LPfi, uncredited quotes or poor paraphrases don't have to be entirely verbatim to violate copyright. Ikan Kekek (talk) 20:50, 21 September 2023 (UTC)Reply
The US Copyright Office says that AI-generated content is not written by a human. It is therefore ineligible for copyright protection/automatically in the public domain.
It is possible for an LLM to generate something that matches existing copyrightable text, though it's not really an LLM if it just copies and pastes from its source files. I'm sure you've heard about the w:en:Infinite monkey theorem (the version that went around the playground when I was a kid said "an infinite number of monkeys typing on an infinite number of typewriters for an infinite amount of time would produce Shakespeare's plays"); LLMs will "randomly" match text much faster than the monkeys would, because they're not actually random, because there are multiple texts to match with (that's the w:en:Birthday problem), and because we don't need an exact match to have a copyvio.
But on our end, a copyvio gets detected the same way, regardless of whether it is intentional, negligent, or accidental, and whether it was generated by a human or an LLM. WhatamIdoing (talk) 15:43, 22 September 2023 (UTC)Reply
ChatGPT is at least better than Bard, which has outright lied to me several times. --SHB2000 (talk | contribs | meta) 12:36, 22 September 2023 (UTC)Reply
Here's an example of a new article that triggers AI content detection- Gwanggyo. If you've spent a lot of time on ChatGPT, you might recognize its "voice" in large parts of the article. Is this problematic? It's certainly no worse than many other new article creations that need template work and copyediting. A 10-second plagiarism check doesn't reveal anything. I'm having difficulty finding the "Gwanggyo Modern library" online, but I'm searching in English, so it's certainly not evidence of a hallucination. Gregsmi11 (talk) 14:07, 22 September 2023 (UTC)Reply
I used Bard and ChatGPT for travel ideas when I was in Tokyo and Osaka last month. Both are not bad, but definitely underestimate the time for some activities when I asked them to create itineraries for me. I would say ChatGPT is worse because it told to spend 90 mins in Dotonbori for lunch and exploration, but it'll take 25 mins for me to walk from previous destination to Dotonbori. So that leaves me only 60 mins to eat and explore?! And it didn't schedule anything in Japan after 5pm, which completely neglects the nightlife scene. At least Bard named some concrete evening activities (by name) in Tokyo. OhanaUnitedTalk page 14:23, 22 September 2023 (UTC)Reply
Another example, surely doomed for deletion: Sebago Lake. I can only guess this was copied directly from ChatGPT without attempting to use our standard templates or styles. However, the phrase "recreational opportunities for humans" does indeed have a certain ring to it. Gregsmi11 (talk) 18:22, 5 October 2023 (UTC)Reply

Walking tour copyright?

[edit]
Swept in from the pub

Many guidebooks describe walking tours or other itineraries with a couple of waypoints. The text in the book is normally copyrighted, and cannot be copied to Wikivoyage word by word. Provided that the text is rewritten, is there any intellectual property law for a sequence of waypoints, or similar? The Millennium Tour is based on a map published by the Stockholm City Museum, which is a public institution. As the tour had no licensing from the author's estate, and is no longer actively hosted, it seems to be fair game. The Stockholm history tour is a composition of a couple of guidebooks and guided tours. I consider to make a Haunted Stockholm Tour, based on a copyrighted book on the topic, which is in turn based on legends and folklore. There is also a commercial tour on the topic. How close can a travel topic be to a copyrighted work? /Yvwv (talk) 03:41, 27 November 2023 (UTC)Reply

As a non-lawyer and speaking for American law, the closest things that come to mind are 1.) the arrangement and selection of quotations can be copyrighted, even if the person assembling those quotations does not own the original copyright on the works being quoted and 2.) sets of plain facts cannot be copyrighted. So if a walking tour is something like the arrangement of quotations, then yes. But if following a linear path that takes you from the shore to inland or from north to south in the most efficient route or that goes from the town's oldest building to its newest one, etc. is just elaborating on a set of facts, then no. As someone who is still not a lawyer by the end of this comment, I'd imagine that tour pathways are generally going to be the latter, but as with most real legal questions, the answer has a lot to do with if someone sues you and how much money he has to afford good lawyers. —Justin (koavf)TCM 04:13, 27 November 2023 (UTC)Reply
I hope expensive lawyers aren't key in Sweden. Finnish law is closer, but neither I am a lawyer, and I haven't researched that aspect of law. If I were inspired by a former tour, I would attribute the original as common-sense courtesy. It is probably best to either just be inspired, not copying, or to ask for permission to revive the tour, although I don't see that the itinerary itself could be copyrighted (as it is about ideas). I assume the situation is similar to copying the plot of a novel without copying the actual writing. –LPfi (talk) 09:37, 27 November 2023 (UTC)Reply
We might also consider business ethics. Professional tour guides normally don't publish their itineraries, so that their knowledge stays useful. Open-source articles on other tours in the same city is however more likely to increase interest for the city, and leave visitors with more money to spare in the city. And in general, Wikimedia projects don't usually limit themselves to avoid rivalry with commercial publishers. /Yvwv (talk) 16:43, 27 November 2023 (UTC)Reply
It might be worth doing some research in a large library (and maybe secondhand bookshops). Although you have identified one copyrighted book on Stockholm's Ghosts, are some of the waypoints also covered by other books? If 3 different books (by different authors) mention the location then good. Maybe there is even an out of copyright book that covers some of the locations (the 1890 guide to Stockholm?). AlasdairW (talk) 23:34, 28 November 2023 (UTC)Reply
After reading Linnell's book, I find it so comprehensive that it will be difficult to find any well-known ghost story in Stockholm that is not mentioned by the book. In any case, much of the quality of folklore is the dramatic storytelling. Linnell does not really deliver that, so over time, the article can hopefully be expanded with ghost legends told in a compelling manner. /Yvwv (talk) 19:42, 29 November 2023 (UTC)Reply
I recommend you to read w:Copyright in compilation. I think a sequence of waypoints is similar to "A directory of the best services in a geographic region", an example of copyrightable things listed in the article.--Hnishy63 (talk) 23:49, 29 November 2023 (UTC)Reply
Thanks for the advice. The book (ISBN 9151827387) in its whole is close to be a complete database. It also suggests a few walking tours, one of them in Gamla stan; the intention of the Haunted Stockholm tour is to make a slightly different itinerary, partially inspired by commercial guided tours. /Yvwv (talk) 01:01, 30 November 2023 (UTC)Reply
Another advice; you may contact the publisher and ask for explicit permission by the author. Tell them the following: 1) I recently found compilations are copyrightable. 2) My itinerary is non-commercial. 3) It uses very small part of your work. 4) It may actually increase sales of the book. 5) I should have requested before publishing my itinerary. You have good chances and anyway can clarify the situation.--Hnishy63 (talk) 22:20, 3 December 2023 (UTC)Reply
Our license requires them to permit commercial use of our text. We are non-commercial, but re-users might not be. WhatamIdoing (talk) 15:01, 4 December 2023 (UTC)Reply
There are plenty of digital sources for Stockholm's ghost stories. I will use other sources than the main book when available, and also add waypoints not found in the book, to avoid copyvio. Hopefully, the article would be good to feature for Halloween 2025, after the 2-year cooldown since the featuring of the Swedish Empire. /Yvwv (talk) 15:14, 5 December 2023 (UTC)Reply

AI-"generated" edits

[edit]
Swept in from the pub

In a discussion at User talk:70.68.168.129 @Ibaman: wrote:

"AI-generated edits are unneeded and unwanted in this travel guide. Shut down. Turn off."

I agree completely.

Other opinions? Does this need to be added to a policy page? Which one? Pashley (talk) 03:05, 7 April 2024 (UTC)Reply

Also Wikivoyage:Votes for deletion/December 2023#Car rental in Tashkent. While I haven't used AI to write anything, I have experimented by asking ChatGPT with this prompt: "Write a Wikivoyage article about [insert destination]" to see what would happen. I do appreciate AI in creating a draft by summarizing the destination (which then I could verify and adapt into my own words before creating the page if I had gone through with this). It appears to be quite accurate and I suspect that the LLM used Wikipedia page entries to learn the text for these destinations. I'm not sure where we should draw the line on AI usage in this project. OhanaUnitedTalk page 03:40, 7 April 2024 (UTC)Reply
Using AI responsibly like you describe it, including changing wordings and checking facts, should be no problem. If you do it like that, probably nobody notices that you used AI. It doesn't differ that much from using encyclopaedias, competing travel guides and other external resources to gather information. In the cases where using AI has been apparent, the text obviously hasn't been checked and rewritten, and not knowing how the used AI has been trained, copyrighted expressions main remain and some of the statements may be hallucinations.
I don't know how to word a guideline so that it does allow responsible use, without encouraging use that is highly problematic. Those who use AI without accounting for the issues, probably aren't prone to follow the spirit nor the wording of such a guideline, but may argue (stubbornly) that it allows their usage of such tools.
LPfi (talk) 06:01, 7 April 2024 (UTC)Reply
I've used ChatGPT for translating content, but that's about it (since the copyright remains with the original WM authors). It's fine to be used in discussions, but again, only for translations. --SHB2000 (talk | contribs | meta) 07:54, 11 April 2024 (UTC)Reply
As I see it, these programs are basically plagiarism machines & we already have more than enough problems with plagiarism -- mostly uncredited copying from WP or lumps of text from someone's marketing material. We've also sometimes had problems with machine-generated text, in particular some pretty awful translations.
I conclude that we should ban use of AI-generated text entirely, at least in main space. Pashley (talk) 08:08, 7 April 2024 (UTC)Reply
I agree with a complete ban. If that is too far, we could add "an individual editor may be allowed to use AI for a specific purpose if there is consensus on that use in the pub". Longer term I would like to see AI running on a WMF server to offer features like an improved InternetArchiveBot which could replace "dead link" with "suspect business closed - see this newspaper report of it closing". AlasdairW (talk) 13:00, 7 April 2024 (UTC)Reply
Given that you know you've used AI effectively when nobody can tell that you've used AI, I can't see how banning AI would hurt, as it basically implies no obvious AI, which no one wants anyway. Brycehughes (talk) 23:56, 8 April 2024 (UTC)Reply
For exactly that reason, I can't see how banning AI would help. The options are:
  • You use AI, but the content was good, so nobody objects – no harm, no foul (except perhaps in the opinion of people who believe that scrupulous compliance with rules is a morally good action, rather than a means to an end).
  • You use AI, but the content was bad, so it gets reverted – the "ban" was pointless (we revert bad content no matter how it's generated).
  • You don't use AI to create good content, somebody incorrectly claims you did, so good content gets reverted – we lose good content (and probably good will and time in unpleasant discussions, too).
  • You don't use AI to create bad content, somebody incorrectly claims you did, so bad content gets reverted – the ban was pointless (it would have been reverted for being bad anyway)
  • You want to use AI to create good content , but you're afraid of breaking the rules, so you don't contribute at all – we lose a new contributor.
BTW, I am leery of people who claim that they can tell the difference between a poor writer, an English language learner, and an AI tool. I put some of my Wikipedia articles through an AI detection program, and it had pretty much 50–50 results. I'm told that the accuracy is much worse for shorter content, which is most of what we do here.
If you want rules that can be effectively enforced, I suggest:
  • No high-volume editing by newcomers.
  • Regular patrollers are encouraged to check some of the facts independently.
WhatamIdoing (talk) 03:17, 9 April 2024 (UTC)Reply
I agree with most of that reasoning. I am one of those who do believe that one shouldn't ignore rules even when they seem counterproductive – the "ignore all rules" rule is about cases where breaking the letter of a rule indeed follows the intended spirit of the rules as a whole.
Rule of Law is an important principle, which I believe is an important factor in the success of the Nordic countries, and being pragmatic about rules can have unforeseen consequences. (Still, the Finnish judicial system has what essentially is an ignore-all-rules rule: the main book of laws quotes Olaus Petri saying that what is not right cannot be law, and judges are allowed to ignore laws they deem unconstitutional – such as violating Human Rights.)
I would certainly not recommend using AI if we forbid it – but unenforceable rules are bad exactly because they undermine the respect for rules in general.
LPfi (talk) 07:59, 9 April 2024 (UTC)Reply
Don't we need to make remarks about the use of AI in Wikivoyage:Copyleft, at least? "Good" and "bad" is not the only issue; copyright violation is also relevant. Ikan Kekek (talk) 11:17, 9 April 2024 (UTC)Reply
Copyvio content is always bad content. However, AI generation is not synonymous with copyright violation. Training an AI system on CC-SA content, for example, does not violate anyone's copyrights. WhatamIdoing (talk) 20:47, 9 April 2024 (UTC)Reply
I'm concerned that inaccurate AI-generated content (which includes most ChatGPT-generated content about reasonably obscure topics, in my experience) may be harder to detect than other bad content. ChatGPT is good at writing things that sound plausible but are actually bogus. In other words, I'm skeptical of the reasoning in User:WhatamIdoing's bullet point that reads "You use AI, but the content was bad, so it gets reverted" – I think AI-generated content may slip through the cracks more easily than other misinformation.
I also think it may be useful to have some kind of warning for good-faith editors who may not realize how unreliable language models are in terms of accuracy. —Granger (talk · contribs) 00:56, 10 April 2024 (UTC)Reply
We could have a warning in some guideline giving advice on good and bad sources, without explicitly banning or accepting AI-derived content. A warning can hardly be seen as sanctioning it. –LPfi (talk) 07:14, 10 April 2024 (UTC)Reply
Although I have never used AI in any Wikimedia project, l do use AI to get the solution of a given problem since 2023 (when the CHATGPT revolution took place). However, I always use my own words when I write the solution myself. If I were going to use AI to contribute in English Wikivoyage, I would ask for transport options, list of attractions and activities with details, list of notable hotels and restaurants etc. individually, and then use my own words to describe these. However, I would never ask to write an entire WV article for me. Sbb1413 (he) (talkcontribs) 06:17, 10 April 2024 (UTC)Reply
For reliability, the things found by AI should always be checked. If you don't find the POI or connection in other sources, confirming the location and other details, then don't list it. –LPfi (talk) 07:17, 10 April 2024 (UTC)Reply
It's hard to even read the discussion and make sense out of what the user wants to say. "It's important to" is a dead giveaway of ChatGPT. You don't want to spend your time writing your own thoughts down, but expect others to read it? -> you earned a Ban, easy as that. -- andree 09:48, 12 April 2024 (UTC)Reply
I don't think that phrase is a "dead giveaway". For one thing, that exact phrase appears on 363 pages here, and many of those pre-date ChatGPT's existence. WhatamIdoing (talk) 18:10, 12 April 2024 (UTC)Reply
YMMV, but 1/2 of answers I get (if I ask non-technical stuff in v3.5) have this condescending tone. I'm allergic to that, so it may be the case that it just triggers me  :-) -- andree 18:46, 12 April 2024 (UTC)Reply
I'm not surprised; ChatGPT, after all, needed to learn its content from somewhere. --SHB2000 (talk | contribs | meta) 21:41, 12 April 2024 (UTC)Reply
The sanctimonious tone sounded more like Gemini to me. But that one unhinged comment sounded like the infamous Sydney. Brycehughes (talk) 22:05, 12 April 2024 (UTC)Reply
AI-generated text is prone to factual inaccuracies (misinformation). AI-generated content is fine for things like abstract art. It has no place for generating text or maps on this wiki MercifulCarriage (talk) 20:38, 31 May 2024 (UTC)Reply

So if I'm first using AI to create a draft of a new page and then subsequently modify it, should I create an initial version with just the AI and then change it in subsequent revisions (to show what the original AI version was)? Or my initial version should be already-revised from AI? OhanaUnitedTalk page 01:58, 28 April 2024 (UTC)Reply

I think the best way is to indeed save also the AI-generated one, for transparency. In similar cases, I usually prepare the second version in a separate tab, to be able to save it quickly, before anybody uses the other one (to avoid an edit conflict, one can click edit on the first version and paste in the revised one).
If you don't use any wording or structure from the AI, just want tips on sights and services, then I'd recommend just including a note on the AI part in the edit summary ("with checked POIs suggested by [name of tool]" or similar). LPfi (talk) 07:47, 28 April 2024 (UTC)Reply
Save it where? We don't want people to save AI-generated text here, for several reasons. Ikan Kekek (talk) 22:10, 28 April 2024 (UTC)Reply
Perhaps I'll use an actual example rather than using hypothetical scenarios. In Wikipedia, the entry on artwork title was first created via ChatGPT and saved. For transparency, the edit summary clearly marked the edit was done using ChatGPT. It is then heavily modified and substantially re-written using own words. Is this an acceptable practice in Wikivoyage page creation (to show what the initial ChatGPT answer was)? Or do we want to work offline to adapt the ChatGPT contents into our own words before saving the final version to publish the initial edit? OhanaUnitedTalk page 14:59, 30 April 2024 (UTC)Reply
We don't have to ban AI (as others have stated, it's not really feasible), but I think discouraging its use in favor of original edits is fair and aligned with our goals, as well as our promise that the site is built by REAL travelers. I have experimented with AI travel questions about places I know well and the AI answers are riddled with mistakes and completely made-up information, but it can present it in an authoritative-sounding way that could trick editors and readers who are not familiar with the places to not recognize it as bad content and misinformation. ChubbyWimbus (talk) 15:39, 30 April 2024 (UTC)Reply
Frankly I don't see the benefit of this stuff. It strikes me as the worst possible combination: it sounds knowledgeable but is in fact completely unreliable. Why would we want to put something like that on Wikivoyage? —Granger (talk · contribs) 16:14, 30 April 2024 (UTC)Reply
To OhanaUnited: I don't support saving raw AI content to Wikivoyage on any page, because it may contain copyright violation, and the user who saves the content has no way of knowing whether it does or not, nor what page(s) it may have come from. Ikan Kekek (talk) 16:44, 30 April 2024 (UTC)Reply
It's still beneficial for translations, which is why I don't support completely banning AI. --SHB2000 (talk | contribs | meta) 21:43, 30 April 2024 (UTC)Reply
It is a good step to discourage AI in favour of manual editing of travel articles. I won't support a complete ban on it as it might be used to translate articles written in other languages, especially Italian. Of course, the AI has to be more reliable than Google Translate for this. Otherwise, as said before, "I would ask for transport options, list of attractions and activities with details, list of notable hotels and restaurants etc. individually, and then use my own words to describe these" if I were use an AI tool. I will clearly mark my edits as "based on Foo AI" if I write something based on what AI says. Sbb1413 (he) (talkcontribs) 04:22, 1 May 2024 (UTC)Reply
I would support rather asking for transport etc. rather than complete articles, and then writing the article text oneself, based on official pages (or trustworthy reviews). @Ikan Kekek: But if one uses AI to create actual text to be used (even in rewritten form), then I think the transparency aspect is much more important than the problems with saving AI text.
For copyright, I don't think we care too much about copyright violations remaining in the history – the text is already distributed on the internet, much more easily available than through the history, otherwise the editor or AI wouldn't have found it. I also don't think anybody is going to restore dubious statements from the AI text.
LPfi (talk) 07:52, 1 May 2024 (UTC)Reply
AI works are also automatically assumed to be in the public domain since no one can own its work (as it lacks human authorship). SHB2000 (talk | contribs | meta) 22:34, 1 May 2024 (UTC)Reply
AI tools are trained with works authored by people. There have been claims that enough of copyrightable matter is left in their output, at least in some cases. I don't know to what extent this has been researched, and I assume we don't have high court decisions yet on whether the tools cause actual copyright violations. –LPfi (talk) 07:00, 2 May 2024 (UTC)Reply
Also, the courts in different countries may reach different decisions, or for different types of tools, so it's complicated.
The "claims that enough of copyrightable matter is left in their output" (i.e., the ones that I've heard about) are concerns about the tool generating allegedly "new" content, part of which happens to match existing content. That it happens on occasion is not particularly surprising, since humans occasionally do the same thing. WhatamIdoing (talk) 16:31, 2 May 2024 (UTC)Reply
Indeed.
If the same wordings are there by chance, just because they are common or natural ways to express things, then there should be no copyright problem. But some tools may have used some sources too heavily, and it becomes like a student using their teacher's favourite phrases, which could pass the line, and some may (like some students) learn how to express things on a less abstract level, with a much higher probability of ending up with identical phrases. It is hard to teach pupils to describe phenomena with their own words, and it may also be hard to teach AIs to do it.
Regardless, if we reword things to match our style, like we should do with content from any source, then the risk of copyright violations remaining on a page is small. Whether there were some in the former versions, available through the history – which spider bots are told not to crawl – is not a problem in my view.
LPfi (talk) 10:04, 3 May 2024 (UTC)Reply