Category talk:Articles with dead external links

From Wikivoyage
Jump to: navigation, search

Help wanted - fix broken links[edit]

Swept in from the pub

I've written a bot that will flag broken URLs with {{dead link}}, which will print a very noticeable warning next to broken links for people who have enabled the ErrorHighlighter gadget from Special:Preferences (for those who haven't enabled the gadget the warning is invisible). Articles with broken links will also appear in Category:Articles with dead external links. I'm still validating that the bot won't break anything and have thus only run it against Category:Star articles and a handful of other articles, but at present that still leaves over 50 articles needing links fixed (or removed, in cases where the associated business has closed). Please help out by reviewing/fixing broken links in Category:Articles with dead external links, and if you would like to see the bot run against a specific article or group of articles please let me know and I will do so. Feedback appreciated. -- Ryan • (talk) • 17:16, 9 April 2016 (UTC)

It would be nice if you could somehow mention the exact link that is dead. Preferably on the talk page in question. Hobbitschuster (talk) 17:38, 9 April 2016 (UTC)
What does it mean that a link is dead, in this context? A website can be temporarily down or unavailable. Even HTTP 404 responses may be due to a temporary malconfiguration. --LPfi (talk) 18:05, 9 April 2016 (UTC)
@Hobbitschuster: if you enable the ErrorHighlighter gadget then you will see a very noticeable "dead link" message right next to the broken link. @LPfi: right now the bot flags links that return 404 errors (page not found) and DNS lookup errors (site not found) as dead links. I've set the bot up so that when it is re-run it will first delete all instances of {{dead link}} in the article, so if a link that was broken somehow comes back to life it would no longer be flagged as dead. -- Ryan • (talk) • 18:22, 9 April 2016 (UTC)
Where do you activate that? Hobbitschuster (talk) 18:46, 9 April 2016 (UTC)
In Preferences / Gadgets / Experimental, tick ErrorHighlighter. -- WOSlinker (talk) 19:07, 9 April 2016 (UTC)
Thanks. Hobbitschuster (talk) 19:34, 9 April 2016 (UTC)
OK. I suppose those errors should not occur on well maintained sites. It would probably still be good to include a timestamp in the template, so that a link that has been dead a long time and those recently marked could be told apart. Then the old templates should also not be removed, but left alone, unless the link has come live again (or the error has become a transient one). --LPfi (talk) 18:26, 10 April 2016 (UTC)
The timestamp is already being added - see Special:Diff/2966265/2969005 which added {{dead link|April 2016}} to nine links. The current implementation uses month and year, which matches w:Template:Dead link, but I would need to modify the bot to leave old template timestamps in place when the bot is re-run. -- Ryan • (talk) • 18:36, 10 April 2016 (UTC)
Sometimes, a link looks to have "come live again" but is actually being cybersquatted - the original venue is still dead and some unsavoury characters registered the name the moment the original legitimate registration expired. The site then returns advertising, linkspam or a listing of the domain name for sale at some extortionate price. Often, it merely redirects traffic to some other domain. If we link to that sort of domain, it makes us look spammy. K7L (talk) 18:44, 10 April 2016 (UTC)
The bot is admittedly much more limited than a human editor - for example, there is no way for a bot to accurately determine if a link is to a site that is being cybersquatted, and as noted previously I'm not flagging sites that timeout or have other potentially temporary issues. That said, I think there is significant value in flagging links that are clearly dead, both to ensure we are linking to accurate information and as a way to more easily find listings for places that might have gone out of business. -- Ryan • (talk) • 19:15, 10 April 2016 (UTC)
The bot will not find all links that need updating but it is finding enough for now. Looks like there is much work to do, is going to take a concerted effort to fix them all but this will improve the site for readers and its search engine ranking. --Traveler100 (talk) 19:58, 12 April 2016 (UTC)
Two updates: first, I've been running the bot in batches against Category:Guide articles, but it's slow going since I want to review all changes to catch any bad edits - examples of bad edits include this one to the "Humphrey's" listings that require fixes to the bot code to handle unexpected characters like a semicolon in a URL. Second, for some reason I am occasionally seeing DNS lookup failures for valid sites, which the bot then flags incorrectly. I switched to Google Public DNS, but I've still seen a couple of false positives; I'd like to get that issue resolved before having the bot run against too many articles. -- Ryan • (talk) • 20:24, 12 April 2016 (UTC)

Update[edit]

As of 17-April the bot has now run against all star & guide articles, so any dead links in those articles should now be tagged with {{dead link}}. While the vast majority of tagging was done without issue, there are a tiny number of edge cases that aren't handled properly and require updates to the code before the bot can be run without supervision. In the mean time, if anyone wants to see the bot run against a specific article or group of articles please let me know. -- Ryan • (talk) • 18:13, 17 April 2016 (UTC)

Until now I've been running the bot in batches and manually reviewing changes in order to catch any problems. Issues that I've fixed include problems with URLs ending in ")", issues with w:Internationalized domain names, occasional DNS lookup failures for valid URLs (I've switched to Google DNS to resolve that one), etc. Since things look fairly good at this point I'm going to let the bot run unsupervised, but if anyone notices any links flagged incorrectly please let me know so I can fix the code. -- Ryan • (talk) • 04:29, 5 May 2016 (UTC)
The bot is doing a good job. I have just noticed on Cramlington that it marked 5 links, but the edit summary only said "Flag 4 potential dead links". In this case this is because two links are the same. I think that this is sufficiently rare not to be a problem. AlasdairW (talk) 21:44, 5 May 2016 (UTC)

The bot has finished running against all articles. After ten years the site has unfortunately built up a lot of dead links, but hopefully having a way to tag them will allow for easier future maintenance. -- Ryan • (talk) • 07:03, 10 May 2016 (UTC)

For anyone else who is like me and keeps an eye on articles within a certain region, a useful tool for finding articles in that region with deadlinks is https://petscan.wmflabs.org/. Here's an example for using that tool to find all articles with dead links within Southern California: [1] (replace "Southern California" with your region of choice). -- Ryan • (talk) • 19:23, 14 May 2016 (UTC)

Dead links bot[edit]

Swept in from the pub

For the first time since last May I'm re-running the bot that tags potentially dead external links with {{dead link}} and adds articles to Category:Articles with dead external links in the process. After 24 hours the bot is up to Den Helder, so I expect it will take another 4-5 days to scan everything. This bot is useful for tracking down closed businesses and for updating stale data, so if there is a particular region you like to look after, consider doing the following:

  1. Enable the "ErrorHighlighter" gadget from the "Gadgets" tab of Special:Preferences. Once enabled you will be able to see dead links and other syntax issues highlighted in articles.
  2. To see a list of articles that contain dead links within a region, go to [2] and change "California|6" to whatever region you are interested in (example: "New York City|6").

Let me know if there are any questions or concerns. Kudos to User:Traveler100 and User:AlasdairW who have already been scrambling to fix dead links as the bot is updating things. -- Ryan • (talk) • 22:18, 24 January 2017 (UTC)

Aw! I was excited that the number of articles with dead links fell below 7000. I started to fix dead links for New South Wales related articles and I was going do it for all of Australia hopefully but now it will take a longer time. Oh well, just more work to do. :) Gizza (roam) 04:41, 25 January 2017 (UTC)
The war against link rot is (unfortunately) never ending :) -- Ryan • (talk) • 06:13, 25 January 2017 (UTC)
I think it can at least be made easier by people not including stuff like "/home.html" in links in the first place. Quite often dead links are fixed by just cutting of something like that. Please try and be on the lookout for stuff like that when adding links. Hobbitschuster (talk) 15:54, 25 January 2017 (UTC)
I have it on my TODO list to have the bot re-check any dead link of the form "http(s)://www.example.com/(index|default|home)*", and if the link works without the "index|default|home" part to then replace it, but I haven't gotten around to implementing and testing that yet. I did recently run an update that fixes links with extra slashes in the URL ("//") or that have an improper protocol ("htp://", "http//", etc). -- Ryan • (talk) • 16:21, 25 January 2017 (UTC)
That would help, indeed. However, the bot also flags a large number of listings for places that have simply closed or changed ownership. I find it a bit depressing... but it's certainly very useful :-) JuliasTravels (talk) 16:35, 25 January 2017 (UTC)
Another thing that might be worth taking into consideration (either in a separate bot or in a future update) is a specific type of link squatting that is falsely labeled as a live link (even if the link has been previously labeled "dead") such as seen here (I raised the case of this specific link at Talk:Isla de Ometepe, but I think the issue is broader than that, as this particular design (is it a particular hosting service that does this to previously live domains?) is particularly common so if it is possible and not too much work to implement something that detects those (or simply not labelling any previously dead links live unless by hand) would be useful. Especially if such a link became dead prior to the first bot run but never showed up as dead. Hobbitschuster (talk) 17:24, 25 January 2017 (UTC)
Domain squatters will be out of scope for anything my bot would deal with, unless someone can come up with a simple and reliable way for an automated tool to identify them. -- Ryan • (talk) • 17:32, 25 January 2017 (UTC)
Amazing how many have gone bad in less than a year. We had almost fixed all marked bad links for the United Kingdom and it is getting towards 100 again and not even halfway though the alphabet. --Traveler100 (talk) 18:13, 25 January 2017 (UTC)
Well, it helps to keep our guides up to date, so that's a good thing all around. How many of those links would you say are really dead dead and how many would you say are just the above outlined problem of complicated URLs jumping around? Hobbitschuster (talk) 18:19, 25 January 2017 (UTC)

@Wrh2: Can you at least tell the bot not to mark links as live that have previously been marked dead (unless they have since been marked live by human editors)? I am more comfortable with a handful of false positives than with linksquatters falsely labeled live links if there is a way to prevent it. Would that be possible to implement? Hobbitschuster (talk) 18:18, 25 January 2017 (UTC)

I would rather not make that change unless there is a broad agreement to do so. Some sites break temporarily, and sometimes people update links but don't remove the {{dead link}} template, so I think it is safest to reflect which links were active at the time the bot last ran. -- Ryan • (talk) • 18:22, 25 January 2017 (UTC)
+1 for doing the change: I get much more "working" links that go do domain squatters than real dead links. Jlg23 (talk)
I think it is more common that a link becomes live through a link squatter being mistaken for the real deal than for a previously dead link to become the genuine article once more. And while some do forget to remove the dead link template, this is caught when checking up on dead links, whereas false negatives are much harder to catch. I personally tend towards any system that produces false positives instead of one that produces false negatives, as false positives are usually less harmful when it comes to dead weblinks. Hobbitschuster (talk) 19:06, 25 January 2017 (UTC)
Yes, in the last month I have been going through the ones marked in May last year as bad links. When I clicked on them, they went to an active web site, in the majority of the cases to domain name squatters. --Traveler100 (talk) 21:07, 25 January 2017 (UTC)
I wonder if a custom edit summary would be a feasible middle ground here. Anyone interested in that problem could look for the edit summary. WhatamIdoing (talk) 01:35, 26 January 2017 (UTC)

Status update[edit]

The bot has now processed every article. For those interested in helping with cleanups:

Thanks to everyone who has helped with cleanups thus far - a review of recent changes shows that a lot of closed listings have been deleted, and a lot of broken links have been fixed. If anyone sees any bot edits that look incorrect please let me know so that I can fix it before running again in the future. -- Ryan • (talk) • 07:00, 28 January 2017 (UTC)

thanks for doing the update. Now we all have some work to do :-) --Traveler100 (talk) 07:53, 28 January 2017 (UTC)
Too true. It's a bad sign when an article like Trepassey and the Irish Loop breaks in a couple of places in the first four days after its creation because the government in St. John's moved the entire parks and environment web site. K7L (talk) 14:04, 28 January 2017 (UTC)

Statistics[edit]

As of 30 October 2017

Articles with dead links by type and status in Category:Articles with dead external links
Type/Status
Matrix
Outline Usable Guide Star Unranked Total (line)
District 39 328 110 12 489
City 2642 2144 0 0 4786
Airport 5 7 2 0 14
Park 142 44 16 0 202
Dive guide 1 5 0 1 7
Region 337 132 2 0 11 482
Country 86 17 0 0 103
Itinerary 19 32 3 0 54
Travel topic 41 32 8 1 82
Total 3312 2741 141 14 11 6227
Articles with dead links by continent and status in Category:Articles with dead external links
Type/Status
Matrix
Outline Usable Guide Star Unranked Total (line)
Africa 177 113 0 1 291
Antarctica 1 0 0 0 1
Asia 658 512 32 3 1207
Europe 957 833 36 3 1834
North America 1261 994 60 6 2323
Oceania 52 48 3 0 103
South America 161 183 1 0 346