Wikivoyage talk:Spam filter
Add topicDiscussion moved from document page here by (WT-en) Evan
Valid links incorrectly filtered
[edit]Im trying to add this
- On the page http://en.wikivoyage.org/wiki/Cairns all extra valid changes appear to be rejected because of a vlaid link to the Cas in o hotel at the bottom. Should I add to the whitelist - Ill try that
- Finnish phrasebook - 'd_onnerwetter.kielikeskus.helsinki.fi'
- The link to the historic Pharmacy Museum's website in the New Orleans/French Quarter creates a problem with those trying to edit the page, as "pharmacy" is one of the spamtrap words. I had to put a linebreak in the URL to save the page. "http://www.pharm acymuseum●org/" is hardly how it should be listed, but the Spam protection filter doesn't let me fix it. What solution can we do about this, or will it just be impossible to include a link in the article? -- (WT-en) Infrogmation 15:28, 29 Oct 2004 (EDT)
- I %66ixed it. It lo%6fks funn%79, but at least the link %77orks. -(WT-en) phma 23:53, 29 Oct 2004 (EDT)
- I stripped out most of the generic regular expressions from the banned content file. I'll try to keep the list down to real URLs rather than general ideas. --(WT-en) Evan 11:55, 2 Nov 2004 (EST)
- trying to add http://www.360blueproperties.com to the rental agencies page for Destin, FL and it keeps saying it was blocked by the spam filter .. ?? whats the next step? —The preceding comment was added by (WT-en) Sprenkle (talk • contribs)
- The next step is removing that entire section from Destin, because not a single one of those entries complied with the apartment listing guidelines. (In turn, one of them was triggering the spam filter.) - (WT-en) Dguillaime 22:50, 25 August 2009 (EDT)
Spam links not filtered
[edit]New ones:
lyrics-sky●com lyrics001●com
-- (WT-en) Mark 08:39, 20 Mar 2005 (EST)
www.51wisdom●com www.ic37●com www.sj-qh●com www.fsyflower●com www.zjww●com www.websz●com www.fsyflower●com www.air520●com www.ywxjm●com www.163school●com●cn erjiguan.dzsc●com sanjiguan.dzsc●com dianrong.dzsc●com dianzu.dzsc●com dianweiqi.dzsc●com jichengdianlu.dzsc●com bianpinqi.dzsc●com lianjieqi.dzsc●com chuanganqi.dzsc●com dianganqi.dzsc●com juanyuancailiao.dzsc●com cixingcailiao.dzsc●com kaiguan.dzsc●com fangdaqi.dzsc●com diandongji.dzsc●com chazuo.dzsc●com dianchi.dzsc●com chongdianqi.dzsc●com dianre.dzsc●com yiqiyibiao.dzsc●com wujin.dzsc●com dianluban.dzsc●com jidianqi.dzsc●com dianziguan.dzsc●com bandaoti.dzsc●com guangdianyuanjian.dzsc●com dianyuan.dzsc●com www.dzsc●com www.zhkaw●com www.myseo●com●cn www.myseo●com●cn
more -- (WT-en) Mark 03:48, 2 Jun 2005 (EDT)
Two more from 3 Jun 2005:
www.oasales●cn www.bjicp●org
-- (WT-en) Wrh2 11:23, 3 Jun 2005 (EDT)
Holiday rentals
[edit]And this annoying guy too: www.holiday-rentals●com (WT-en) Jpatokal 12:59, 12 Nov 2004 (EST)
- It's really unclear to me if this one really counts as spam. Can we discuss this? I'd like to see what the range of views out there is. -- (WT-en) Mark 13:07, 12 Nov 2004 (EST)
- It's a commercial site, the guy reposted his ads after I removed them and didn't react to my message telling him to stop. So yeah, it's spam. Now, I'll grant that holiday rentals are a viable choice of accommodation in some places — but why this guy's commercial on every page and not somebody else's? (WT-en) Jpatokal 13:17, 12 Nov 2004 (EST)
- Almost all of our listings are for commercial enterprises, (at least hotels usually charge me money). I don't think that commercial has anything to do with it whatever.
- Is it the fact that this appears to be a web-based aggregator the problem? I want to know, because I'd really like to narrow in better on what is acceptable. -- (WT-en) Mark 13:25, 12 Nov 2004 (EST)
- I'm using "commercial" as short hand for "somebody trying to make money off somebody else's work" here, ie. it's not a primary source. (WT-en) Jpatokal 08:02, 13 Nov 2004 (EST)
- It appears to be a web directory, so links to the site are generally inappropriate (we want direct listings here). So as far as adding it to the filter, I have two opposing feelings about this: a) I'm not really concerned enough to add it to the filter if they only do this once and b) I can't see any reason why we would ever have a link to the site, so it might not hurt to add it rather than think about it. -- (WT-en) Colin 13:41, 12 Nov 2004 (EST)
- Oh nevermind, he re-added a couple after they were removed and a message sent to him. Add it. -- (WT-en) Colin 13:42, 12 Nov 2004 (EST)
It
- Yeah, you're right. It's a web only listings collector. We shouldn't accept this edit. I guess part of what's bothering me though is that we have a history of not allowing primary sources for apartment rental either (anybody remember Patricia from Brazil?) I think I'm going to try to get a discussion about this going on Talk:Finding accommodation -- (WT-en) Mark 04:53, 13 Nov 2004 (EST)
Discussion
[edit]I moved the discussion stuff from the document page here. --(WT-en) Evan 23:05, 1 Nov 2004 (EST)
Complete removal of SPAM links
[edit]I think the reason why they do SPAM is to achieve PageRank as explained at http://www.google●com/technology/
Just deleting the SPAM entry is not helpful because wikipedia is still linking those at edit history preview page like http://wikivoyage●org/en/index.php?title=Main Page&diff=47131&oldid=47129
So there should be a way to completely hide these entries from history preview also. -(WT-en) Bijee 23:46, 9 Dec 2004 (EST)
- I don't think so. Our Robots File tells robots to avoid looking into history and diffs. And Googlebot obeys robots.txt. Of course, Spammers might be under the impression that the indexing will occur, and are therby encouraged to annoy us. -- (WT-en) Colin 23:59, 9 Dec 2004 (EST)
Recent Changes Home page
[edit]I made one of my Firefox homepage as Special:Recentchanges, and I check those Recent Changes with only IP address -(WT-en) Bijee 23:46, 9 Dec 2004 (EST)
Google's Comment Spam prevention
[edit]Re , any way to implement this at Wikivoyage? All we'd need is the attribute rel="nofollow" on all external links. (WT-en) Jpatokal 09:48, 19 Jan 2005 (EST)
- This is implemented in MediaWiki 1.4, I believe. We should be moving to that soon. --(WT-en) Evan 12:21, 10 Mar 2005 (EST)
6x●to
[edit]http://6x●to/ is a free web redirection service. This is the sort of place spammers will tend to congregate in. I would suggest any subdomain in the domain 6x●to be blocked until authorised by an adminiatrator - or however it is done.
Terms of use state:
- 6x●to feels that spam in any form, including but not limited to, unsolicited commercial email, irc messages and newsgroup postings is a serious abuse of the network and will not tolerate 6x●to's name being used in such a way. If you chose to abuse the network or defame 6x●to in any way, your host name will be immediately deactivated and appropriate legal action will follow from 6x●to, your ISP and the related parties.
-- (WT-en) Huttite 05:03, 26 Jan 2005 (EST)
- Update: I sent an e-mail to abuse@6x●to and the reply indicates that the subdomain URL that was spammed on the Main Page on 26 Jan 2005 was closed as a result! Somebody with ethics!! -- (WT-en) Huttite 05:33, 27 Jan 2005 (EST)
- Have sent complaint to abuse@nic.uni●cc (last spam of Main Page, against uni●cc policy). Will see what happens. -- (WT-en) JanSlupski 13:54, 27 Jan 2005 (EST)
- Still lots more spam links from both servers coming in. I'd suggest preemptively blocking them. (WT-en) Jpatokal 00:58, 2 Feb 2005 (EST)
- I note that 6x●to are blocking the promoted URL's also!! I notice that the spam stops once 6x●to does this. E-mailing abuse@6x●to appears to be effective. I suspect that 6x●to is also having trouble keeping up, so blocking any 6x●to subdomain would be useful. -- (WT-en) Huttite 03:50, 2 Feb 2005 (EST)
uni●cc
[edit]uni●cc is a redirection service. A copy of their terms of service are at http://www.uni●cc/site/info_terms.php Reports of spammers abusing the service can be sent to mailto:abuse@nic.uni●cc -- (WT-en) Huttite 04:31, 8 Feb 2005 (EST)
- Tried to complain to that address before (27 Jan -- see above), but no answer, no results... :-( -- (WT-en) JanSlupski 06:25, 8 Feb 2005 (EST)
- I understand it may feel pointless to do this but I would suggest you post a copy of the complaint on the talk page for the IP address of the user that placed the spam too. It may not happen overnight but it MAY happen. If there is no response then the website links can always be chongqed. -- (WT-en) Huttite 06:50, 8 Feb 2005 (EST)
serverlogic3●com.
[edit]Can we ban serverlogic3●com? There is malware which molests the uploaded wikivoyage pages of unsuspecting victims (like User:(WT-en) Wonderfool) to include advertisments which popup when the user mouses-over the advertising term. The ad is retreived from serverlogic3●com. So if we ban serverlogic3, we will at least prevent infected users from uploading evilified pages. -- (WT-en) Colin 19:52, 4 Mar 2005 (EST)
New ideas
[edit]So, I'd like to make it easier for us to add new items to the spam filter. Here's my plan:
- A new page, Project:Local spam blacklist, holds our local spam regular expressions that aren't on the CommunityWiki BannedContent list.
- Another new page, Project:Local spam whitelist, contains regular expressions that are on the banned content list at CW that we don't want to use. Example: the one with the pharmacy thing in it.
- A cron job updates our spam system each day -- downloads from CW, removes stuff in non-spam page, adds stuff in local spam page.
If it gets abused, we can protect the local pages, but my guess is that spammers aren't going to take the time to figure out our system and route around it. So I think we can leave them unprotected at first. I'm going to talk to the CW folks and see if our local regexps can feed up to the aggregate one (so everyone benefits from our experience).
Any comments or criticism on this are welcome. --(WT-en) Evan 12:20, 10 Mar 2005 (EST)
- The two local pages would be exempted from the spam rules, too. --(WT-en) Evan 13:00, 10 Mar 2005 (EST)
- Works for me. -- (WT-en) Mark 15:35, 10 Mar 2005 (EST)
- This is now implemented, with the exception that the local lists are checked real-time. --(WT-en) Evan 17:47, 13 Oct 2005 (EDT)
Unable to save this page
[edit]As a side effect of new spam filter (super!) you cannot save this page (as a whole) anymore... --(WT-en) JanSlupski 15:56, 10 Mar 2005 (EST)
- You can remove spam URLs from this page as they are added to the filter. (WT-en) Jpatokal 20:23, 10 Mar 2005 (EST)
Jobdao - Is it spam?
[edit]A user added http://www.jobdao●com/protest/vtest001_26.htm to the Main Page. The target website is in either Japanese or Chinese characters, so I cannot read it. But I think it is a Job/CV website. Is it spam? -- (WT-en) Huttite 05:12, 14 Jul 2005 (EDT)
- I think in the context of the Main Page and with no explaination, yes it is. -- (WT-en) Mark 05:16, 14 Jul 2005 (EDT)
University of Tokyo banned!?
[edit]Ooi! My alma mater, the University of Tokyo, is inexplicably banned. Can it be removed from the filter list? "u-tokyo.ac.jp" (WT-en) Jpatokal 09:48, 8 Aug 2005 (EDT)
- I added it to the local spam whitelist. --(WT-en) Evan 17:50, 13 Oct 2005 (EDT)
Sensitivity vs. Specificity
[edit]False Positives vs. False Negatives (FP:FN) is a classic problem in medical testing. This tension is expressed as Sensitivity vs. Specificity. The more sensitive a test is the more likely you will see false positives (Type I error). The more specific a test is the more likely you will see false negatives (Type II error). (This conundrum reminds me of the Uncertainty Principle.) We have the same problem with blacklists. (Whitelists counteract false positives.)
The true rates of error depend on testing accuracy and precision, and the frequency of true positives in the population of interest. These factors can be addressed with Baye's theorem which is beyond the scope of this discussion.
In medical diagnostic testing, a common strategy is to screen with high sensitivty tests and then to verify positives with high specifity tests. This strategy has the benefit of reducing the cost of testing and minimizing the risk of false positives and false negatives.
In blacklists we can measure cost as the number of elements (words,URLs,patterns) that must be compared to new content. In order to strictly follow the medical model we would need a two stage blacklist. The first stage blacklist would have spam words and URLs that are not associated with spam words (e.g. \.5g6y\.info - this URL doesn't use spam words). The second stage blacklist would have all known URLs associated with spammers.)
We are forced to compromise by merging both types of tests into one blacklist which is a finite resource with a 'price' for size. The 'price' is system loading, user inconvenience, and maintenance.
To recap (more% indicates percentage of a finite resource):
- more% blacklisted URLs => more specificity
- more specificity => more false negatives
- more false negatives => more permission for bad content
- more specificity => more false negatives
- more% blacklisted words => more sensitivity
- more sensitivity => more false positives
- more false positives => more blocks to good content
- more sensitivity => more false positives
One minimax strategy for a single blacklist:
- Reduce the 'cost' by relying more on spam words that are associated with many spammer URLs
- Reduce the number of false positives by tuning the spam words with regex
- Reduce the number of false negatives by including spammer URLs that do not use spam words
Most blacklists depend heavily on banning URLs. Spammers have an easy time finding new URLs and makes the effort open ended.
I have developed a blacklist that uses banned words primarily. For more details, visit my user page:
--(WT-en) jwalling 17:10, 1 Jan 2006 (EST)
Rebuttal
[edit]- This list is only a short term measure that enables Wikivoyageers to block spammers not detected by the bigger banned content list, that does include word terms. Some of those words are the same words as we want to use, such as Casino. This list allows us to be extremely sensitive to specific spammers. Forcing them to find new URL's. We want spammers to have to work hard for their money, meaning they have to go to lots of trouble to set up new URL's. If spammers realise that every URL that is considered spam will find its way onto the shared wiki master list faster than they can find wikis to spam. It also means we can ban them with less effort than setting up a new URL takes. To be useful, Spam links need to survive a few days or even weeks so search engines can find them. I have found that it is much more effective to make a good website, that search engines like, by designing it well. Spamming is a wasted effort by the naive and ultimately harms their own interests. -- (WT-en) Huttite 18:15, 1 Jan 2006 (EST)
- Is there any effort to leverage spam words? My point is, why wait for new spammers to strike if a new spam word will prevent them from striking? For example - Spammer A leaves the new spam word viagrapecia. Block viagrapecia. Spammer B comes along to post viagrapecia and is blocked. If you only block Spammer A's URL, spammer B has a clear shot. --(WT-en) jwalling 19:40, 1 Jan 2006 (EST)
- Postscript - a better example of spam word blocking with regex is height:\s*\dpx, it's simple and effective. How many times have you seen
- Is there any effort to leverage spam words? My point is, why wait for new spammers to strike if a new spam word will prevent them from striking? For example - Spammer A leaves the new spam word viagrapecia. Block viagrapecia. Spammer B comes along to post viagrapecia and is blocked. If you only block Spammer A's URL, spammer B has a clear shot. --(WT-en) jwalling 19:40, 1 Jan 2006 (EST)
- <div id="yadayada" style = "overflow:auto; height: 1px; ">
- followed by a long list of spammer URLs?
- Perhaps there is scope to block any HTML construct that generates a hypertext link on a wiki without using the wikifeatures to do this.
- I also think the logic is that a base URL is a bit harder to set up than the word or HTML is in a URL link. Though I agree it would be nice if we blocked words too, and I think we do. However I think that past experience has been that too much anticipatory blocking using spam words is too sensitive and gives too many false positives. On Wikivoyage almost any false positive is a BAD THING as the Spam Filter causes BAD THINGS to happen to the rest of the edited text - bug? reported - and work is not recoverable - a design feature? I would rather one or two new ones slipped through the net, so I know who they are, than having legitimate websites being blocked and users inconvenienced because someone added a URL that was something like spam. e.g.Pornping in Chiang Mai.
- To use the medical analogy, this is like a vaccine for spam, rather than a broad spectrum antibiotic. In this case we target the precise examples rather than making the environment toxic. -- (WT-en) Huttite 20:12, 1 Jan 2006 (EST)
- Another thought. If we identify the spammer's URL then action can be taken to subvert the effect of spamming. This is known as being Chongqed by some and uses the URL and spammed keywords to outrank the spammed URL in search engines - in theory at least. -- (WT-en) Huttite 20:20, 1 Jan 2006 (EST)
- I haven't seen a single rebuttal to using height:\s*\dpx. I think people are so married to using URLs for blocking, they can't rethink the problem. By the way, Huttite, if that block was installed, your personal page would not have been spammed. --(WT-en) jwalling 20:25, 1 Jan 2006 (EST)
- I disagree that URLs are more stable than spam words. Many URLs are setup at free host for redirection. If you want to sell viagrapecia you have to use that term to get good SEO.
- If a spammer sets up on a free host for redirection block the whole free host URL domain. The spammers can never use the same free host twice, nor can any other spammers either. The number of free host URL's is limited surely. If any valid lites use the domain they can be un blocked on a case by case basis. -- (WT-en) Huttite 20:40, 1 Jan 2006 (EST)
- What if the free host is Yahoo.com? I have seen that situation. I am not proposing that all blocks be done via spam words. I am looking for a rational balance. For every example their is a counter example. Use what makes sense, like height:\s*\dpx, there is no valid reason for users to hide content, and if they must they can use <!-- -->, Another thought, before you add a spam word, you can search your wiki to see if it is in use. Remeber regex, you can fine tune to prevent accidental matches. I am using a spam word list for blocking at KatrinaHelp.info and I have not seen a single, not one, spambot deposit since Dec 16, 2005, where we saw dozens in the previous weeks. --(WT-en) jwalling 20:55, 1 Jan 2006 (EST)
- In some respects, blocking spam entirely is censorship. If a Wiki is open to all, shouldn't anyone be allowed to put anything they want up so that others can judge it before it is taken down again? By blocking spam I do not here what the spammer says as they are filtered out. By only blocking spam URL's I hear them when they say something new. And if it is still spam they get blocked again. Surely everyone has the right to free speech, and a responsibility to say what others should need to hear. Unfortunately spammers tend to abuse that right by shouting loudly so that others cannot be heard. There needs to be a balance between control and total exclusion. I think the current spam filter, for all its faults, strikes an appropriate balance. It may not be the best solution, but it does a relatively good job. Besides we might want to use some spammed links, if they are travel related. Also, find me a page that still has non-travel related spam on the current revision that has been there long enough to also be found in a search engine. -- (WT-en) Huttite 21:32, 1 Jan 2006 (EST)
- What if the free host is Yahoo.com? I have seen that situation. I am not proposing that all blocks be done via spam words. I am looking for a rational balance. For every example their is a counter example. Use what makes sense, like height:\s*\dpx, there is no valid reason for users to hide content, and if they must they can use <!-- -->, Another thought, before you add a spam word, you can search your wiki to see if it is in use. Remeber regex, you can fine tune to prevent accidental matches. I am using a spam word list for blocking at KatrinaHelp.info and I have not seen a single, not one, spambot deposit since Dec 16, 2005, where we saw dozens in the previous weeks. --(WT-en) jwalling 20:55, 1 Jan 2006 (EST)
- If a spammer sets up on a free host for redirection block the whole free host URL domain. The spammers can never use the same free host twice, nor can any other spammers either. The number of free host URL's is limited surely. If any valid lites use the domain they can be un blocked on a case by case basis. -- (WT-en) Huttite 20:40, 1 Jan 2006 (EST)
- I disagree that URLs are more stable than spam words. Many URLs are setup at free host for redirection. If you want to sell viagrapecia you have to use that term to get good SEO.
- I think you have made the most succinct case for openess. It's a good standard to follow. If your methods are successful and openess is the paramount concern there is no need to change. --(WT-en) jwalling 21:40, 1 Jan 2006 (EST)
Just to throw in my 200 rupiah, if you look up earlier on this very page you'll see that we tried keyword-based blocking earlier and it didn't work too well. Wikivoyage covers the entire planet, which includes the legit casinos of Macau and lots of Thai porn, as the word means "blessing" there... (WT-en) Jpatokal 23:13, 1 Jan 2006 (EST)
- I get it. Banned spam words bad. Banned spammer URLs good. In the meantime I will continue to explore the benefits of banning spam words on the wikis I maintain. One size does not fit all. --(WT-en) jwalling 15:46, 2 Jan 2006 (EST)
Localization
[edit]I wanted to point out, since I don't think it's noted elsewhere, that the names of the local spam filter pages can be localized with MediaWiki:spamwhitelist and MediaWiki:spamblacklist respectively. --(WT-en) Evan 16:22, 30 October 2006 (EST)
Essaouira-voyage
[edit]Added after reverts of two edits: , . Am I following the policy correctly? --(WT-en) DenisYurkin 13:17, 15 November 2007 (EST)
linking from error message to a place to discuss it
[edit]Can we link from "save was blocked" message to a place where a user can ask a question?
Right now, it looks absolutely techy, and it takes serious efforts to understand what's wrong, and what I can do about it. Just try to save User:(WT-en) DenisYurkin before my comment in Project:Local spam blacklist#Catch-all pattern is processed, and you'll see what I mean. --(WT-en) DenisYurkin 17:08, 21 March 2008 (EDT)
First person pronouns
[edit]Per Project:First person pronouns. Would it be possible/desirable to ban all occurrences of first person pronouns (I, us, me, we, my, mine, our, ours) used in-article via the spam filter? By, say, an expression banning the occurrence of {{isIn|}}/{{isPartOf|}} + any of the above pronouns? The only time this would cause problems, I think, is at Project:Breadcrumb_navigation, but it would be easy enough to purge it of "we".
Banning in-article use of first person pronouns would hit a lot of spam, as well as inappropriate blog-style contributions. I'm sure it would occasionally confuse a new editor, but I'd argue the net benefits > disadvantages. --(WT-en) Peter Talk 22:00, 18 May 2009 (EDT)
- Mm-mm, this would be so tricky to get right I'm not sure it's feasible, and it's not in Wikivoyage's best interest to start presenting obscure error messages to well meaning contributors of "We ate at this amazing little restaurant" type. I'm not sure the spam blacklist can parse across lines; even if it does, it'll be really computationally intensive. (WT-en) Jpatokal 22:28, 18 May 2009 (EDT)
- I also think the side effects would be hard to determine. We is mentioned in thousands of articles on the site. How about we start with a couple of small expressions, and see how we go. We could do our guests, and our hotel. That would get around 200 articles right now, and I've just checked a fair selection and all seem inappropriate. It would certainly limit the impact as a starting point. If it is effective, we could add others. --(WT-en) Inas 22:35, 18 May 2009 (EDT)
- Agreed that the first-person pronouns would be too harsh. It's possible that an article could have a quotation from literature like Mount Fuji#See or a public figure that would trip the spam filter (although the Fuji example would be clean). Would "your guests" be caught by a spam filter that was patrolling for "our guests"? (WT-en) Gorilla Jones 23:49, 18 May 2009 (EDT)
- I thought that this would be easy as a regex, but a closer look at the blacklist, seems that they apply to URL's only, so I don't think this extension will allow it. --(WT-en) Inas 00:22, 19 May 2009 (EDT)
- Yes, such a filter would catch that, as well as "tour guests" (which has at least some legitimate uses). The hotel version would also catch things like "four hotels", which is a phrase that shows up in at least 23 articles now, and could also trip over currently unseen but not implausible names like "Troubadour Hotel". Words and short phrases tend to generate false positives (there are an awful lot of UK placenames that get caught by poorly-written profanity filters), and while that can be mitigated by increasingly-unreadable embellishments to the expressions – perhaps ([^\w]|^)[Oo]ur – it's still a fragile solution. - (WT-en) Dguillaime 00:27, 19 May 2009 (EDT)
- Just as a test, I stomped on every occurrence of our guests on the wiki. I reviewed each change by hand (wouldn't run an unapproved bot, or anything), and I refined the logic as I went through the 100 odd out there. The final logic I applied went pretty much like.
- 1. Locate the articles using the first party pronoun phrase.
- 2. Check for a few key phrases, based on Project:Words to avoid, that could always be eliminated, and eliminate them.
- 3. Check for touting language in a full sentence, and then eliminate that sentence from a listing.
- 4. If the sentence can't be identified - not properly formatted - and it is an xml sleep entry, eliminate the description.
- 5. Check how many sleep entries exist, and if there are more than 15 eliminate the touting entry altogether, if it is a xml sleep entry.
- 6. Some formatting, make sure the url entry is formatted correctly, that the name and description is mixed case, also fixed a few common spelling errors (breakfast is never complementary)
- I have to say the test went pretty well. There was only one error it hit, in a mission statement for a museum, and it was easy to tweak it to not make that type of correction outside of the sleep section.
- I'm happy just to keep it in the cupboard with my mop and bucket, but if anyone else thinks there could be potential here, I'm happy to investigate it further with a possible aim to be a fully grown toutbot one day. With further development it could possibly notify a user, or only check unpatrolled edits, or whatever.. I also have to say that there were some really gross breaches of policy there, with these phrases pointing to articles that had entire web pages pasted into them. The key to this working, is the fact that a few phrases match a remarkable number of these listings. Any thoughts? --(WT-en) inas 06:51, 26 May 2009 (EDT)
- I'd be a little hesitant about unleashing bots that go around and delete things on their own, but I could certainly see the value of one that trawls through existing articles flagging suspicious stuff, and perhaps even one that reverts suspicious new edits on sight. (WT-en) Jpatokal 12:06, 26 May 2009 (EDT)
- I'll keep playing for a while, and see what I can do. This edit I think is a pretty good idea of what can be done. It removes image tags from sleep sections, when they use a hotel name, and it will remove sentences containing spam phrases, like "visit or a lifetime", "perfect setting", and "most luxurious", and "virtual tour", from sleep listings, and it will also modify the phrase majestic view. Of course there will always be a false positives, that is the trade-off I guess, but with each one caught, the next becomes less likely, and it certainly left the rest of this article alone. Of course it would also have fixed the url, etc, if it was poorly formatted. --(WT-en) inas 01:12, 27 May 2009 (EDT)
- From surveying the articles, I've got around 70 phrases that commonly appear in touting sleep and eat listings, including first party pronoun expressions, and the other typical expressions we see. I am currently running a script for the last day or so which checks one article every 10 minutes against the expression list. From the resulting list, so far I've edited a hundred or so, and all but two have been undesirable. The other two were probably good faith, although IMO still a little clichéd. So, the hit ratio, while good is probably not sufficient just to let it loose quite yet. After I've finished the trawl, I'll consider whether to continue, and create a page of articles needing review. --(WT-en) inas 00:37, 4 June 2009 (EDT)
Spam filter vs spam filter
[edit]The :es blacklist appears to be uneditable due to the fact that it has blacklisted terms on the page (obviously). I noticed this problem when trying to add Allopurinol and friends to the pharmaceuticals list, as I did here. Any idea why this is happening? --(WT-en) Peter Talk 20:33, 21 June 2009 (EDT)
- I presume it's because es:MediaWiki:Spam-blacklist is blank; it should be set to "Wikivoyage:Local spam blacklist". (WT-en) Jpatokal 22:30, 21 June 2009 (EDT)
- Got it, thanks. For future reference, the system message is at MediaWiki:Spamblacklist. --(WT-en) Peter Talk 02:11, 22 June 2009 (EDT)
- Seems to work so I won't argue, but the docs say the message name should have the hyphen... (WT-en) Jpatokal 04:14, 22 June 2009 (EDT)
- It also should be included in the Special:Allmessages list. Something odd afoot. --(WT-en) Peter Talk 04:53, 22 June 2009 (EDT)
User registration swarm
[edit]Mates,
i think there is a growing problem with our user registration. Today a significant amount of user accounts were created which are definitely created by spambots. Main reason is that nearly every 30 minutes two accounts gets created within two minutes. Usernames are partly the same that are created at WT,too. What is our captcha level and do we have a tool in place if in 30 days they mature? Jc8136 (talk) 15:57, 26 October 2012 (CEST)
- Maybe you can check if you can use MediaWiki:Titleblacklist to add excluding regex rules. I ask Hansm to add Fancy captcha images but I do not know if he has some time. --Unger (talk) 17:57, 26 October 2012 (CEST)
Title blacklist and "the..."
[edit]I request that an administrator redirect Theft to Crime. It's a valid, viable requested topic for an article but can't be created because a filter is preventing non-admins from creating any article name beginning with "The..." K7L (talk) 02:36, 10 May 2015 (UTC)
- Done. Nurg (talk) 02:47, 10 May 2015 (UTC)
- And I've removed "the" from the blacklist. Nurg (talk) 03:04, 10 May 2015 (UTC)
Spam blacklist
[edit]Could somebody please tell me how to find our blacklist for touts and spammers? I have triggered the filter several times tonight trying to add the website of a taxi firm to Farnborough. --ThunderingTyphoons! (talk) 21:24, 20 August 2019 (UTC)
- It's funny — I've been doing some wikidetective work lately and your edit summary actually made me suspicious. Then I realized! Maybe I need to stop over-thinking Wikivoyage problem editors?? --Comment by Selfie City (talk | contributions) 22:32, 20 August 2019 (UTC)