User talk:(WT-en) InterLangBot

From Wikivoyage
Jump to navigation Jump to search

Hey Evan,

Could you run this SQL through the db please?

UPDATE user SET user_rights='bot' WHERE user_name='InterLangBot';

Thanks!

Using the Pywikipedia Bot Frameset

[edit]

The big advantage of the Pywikipedia Bot Frameset is it's good maintainance. It develops together with the MediaWiki software, so that the bot still keeps working after software upgrades (what my old ones didn't). Furthermore, they are somewhat more comfortable to use.

At the other hand, there are some special Wikivoyage related features required that the pywikipedia interwiki bot doesn't support. Those are:

  • Our script policy says, that every bot always has to check two pages: 1. Wikivoyage:Script Policy/Run and 2. <Bot's User Page>/Run.
  • We have to maintain WikiPedia links that should stand at the bottom of the list of other languages, but in fact they are not inter language links.
  • The MediaWiki messages are turned off on Wikivoyage, but the bots rely on them.

I have done some more or less dirty hacks on 3 files of the Pywikipedia bot frameset in order to teach the InterLangBot (a modified interwiki bot) a Wikivoyage like behaviour. Since Pywikipedia software frequently changes, I think the best way is to use software patches that modify the most recent Pywikipedia software. The patches are in the „Context-diff“ format.

Here are my patches basing on the packaged version of Pywikipedia form 2005-08-25. Copy and paste them into files. Don't ommit blank lines an spaces!

Patch for "wikipedia.py"

[edit]

Copy this into a file named wikipedia.py.patch.

*** wikipedia.py        Tue Oct 18 15:03:53 2005
--- wt_wikipedia.py     Tue Oct 18 15:02:33 2005
***************
*** 653,659 ****
              raise PageNotSaved('Bad status line: %s' % line)
          data = response.read().decode(self.site().encoding())
          conn.close()
!         if data != u'':
              # Saving unsuccessful. Possible reasons: edit conflict or invalid edit token.
              editconflict = mediawiki_messages.get('editconflict', site = self.site()).replace('$1', '')
              if '<title>%s' % editconflict in data:
--- 653,662 ----
              raise PageNotSaved('Bad status line: %s' % line)
          data = response.read().decode(self.site().encoding())
          conn.close()
!
!         # !Wikivoyage: Skip investigation why saving was unsuccessful since MediaWiki messages
!         # are turned off at wikivoyage. Otherwise, mediawiki_messages.get() exits the bot.
!         if False:
              # Saving unsuccessful. Possible reasons: edit conflict or invalid edit token.
              editconflict = mediawiki_messages.get('editconflict', site = self.site()).replace('$1', '')
              if '<title>%s' % editconflict in data:
***************
*** 1522,1528 ****
          insite = getSite()
      s = []
      for pl in links:
!         s.append(pl.aslink())
      if insite.category_on_one_line():
          sep = ' '
      else:
--- 1525,1533 ----
          insite = getSite()
      s = []
      for pl in links:
!         # !Wikivoyage: We missuse categories for WikiPedia and Dmoz links.
!         # So, avoid language prefix in any case.
!         s.append( '[[%s]]' % pl.title())
      if insite.category_on_one_line():
          sep = ' '
      else:

Patch for "interwiki.py"

[edit]

Copy this into a file named interwiki.py.patch.

*** interwiki.py	Tue Oct 18 15:04:03 2005
--- wt_interwiki.py	Tue Oct 18 15:02:13 2005
***************
*** 789,795 ****
                          timeout=60
                          while 1:
                              try:
!                                 status, reason, data = pl.put(newtext, comment = wikipedia.translate(pl.site().lang, msg)[0] + mods)
                              except wikipedia.LockedPage:
                                  wikipedia.output(u'Page %s is locked. Skipping.' % pl.title())
                                  return False
--- 789,800 ----
                          timeout=60
                          while 1:
                              try:
!                                 # !Wikivoyage: Edit only, if edits from the bot are allowed.
!                                 if chkEditAllowed( pl):
!                                     status, reason, data = pl.put(newtext, comment = wikipedia.translate(pl.site().lang, msg)[0] + mods)
!                                 else:
!                                     wikipedia.output(u'Bot is not allowed to edit on "%s:". Skipping.' % pl.site().lang)
!                                     return False
                              except wikipedia.LockedPage:
                                  wikipedia.output(u'Page %s is locked. Skipping.' % pl.title())
                                  return False
***************
*** 1068,1073 ****
--- 1073,1092 ----
          sa.add(page, hints = hintStrings)
  
  #===========
+ 
+ # !Wikivoyage: Check User:InterLangBot/Run and Wikivoyage:Script policy/Run on the
+ # appropriate wiki.
+ 
+ def chkEditAllowed( pl):
+     botpage = wikipedia.Page( pl.site(), 'User:'+config.usernames['wikivoyage'][pl.site().lang]+'/Run')
+     bottxt = botpage.get()
+     genbotpage = wikipedia.Page( pl.site(), pl.site().family.genbotrunpage[pl.site().lang])
+     genbottxt = genbotpage.get()
+     return bottxt == 'yes' and genbottxt == 'yes'
+ 
+ #===========
+ 
+ 
          
  globalvar=Global()
      

Patch for "login.py"

[edit]

Copy this into a file named login.py.patch.

*** login.py    Tue Oct 18 15:03:47 2005
--- wt_login.py Tue Oct 18 15:02:53 2005
***************
*** 124,130 ****
                  n += 1
                  L.append(m.group(1))

!         if len(L) == 4:
              return "\n".join(L)
          else:
              return None
--- 124,131 ----
                  n += 1
                  L.append(m.group(1))

!         # !Wikivoyage: For some strange reason, wikivoyage sends 3 of 4 cookies twice.
!         if len(L) == 7:
              return "\n".join(L)
          else:
              return None


Applying the patches

[edit]

Well, I'm not so familar with that stuff, sorry for any inconvenience. Apply the patches like that:

  1. Download the packaged version of Pywikipedia form 2005-08-25 (if you are keen enough, you also can try the most recent CVS version) of the Pywikpedia bot framework.
  2. Copy the above patches into files in the same directory.
  3. Change into directory pywikipedia
  $ cd pywikipedia
  1. Run for each of the 3 files a patch command like that:
  $ patch -p0 wikipedia.py wikipedia.py.patch
  $ patch -p0 login.py login.py.patch
  $ patch -p0 interwiki.py interwiki.py.patch
  1. Knock on wood and hope the best. Check the warnings of each patch command. If it tells you something about offsets and fuzzy foctors, that's all right. It is because the line numbers in the patches have been shifted due to changes in the Pywikipedia software. If you should get an error, the Pywikipedia development has hit one of the lines I did change with my patches. Then, the patches could be regarded as outdated.

File "families/wikivoyage_family.py"

[edit]

This is not a patch, but the whole file. Replace the outdated one shiped with Pywikipedia with this one:

# -*- coding: utf-8  -*-
import family, config
    
# The wikimedia family that is known as wikivoyage

# Translation used on all wikivoyages for the 'article' text.
# A language not mentioned here is not known by the robot

class Family(family.Family):
    name = 'wikivoyage'
    
    def __init__(self):
        family.Family.__init__(self)
        self.langs = {
            'de':'de',
            'en':'en',
            'fr':'fr',
            'ja':'ja',
            'ro':'ro',
            'sv':'sv',
        }
        self.namespaces[4] = {
            '_default': 'Wikivoyage',
        }
        self.namespaces[5] = {
            '_default': 'Wikivoyage talk',
            'de': 'Wikivoyage Diskussion',
        }

        # Put [[WikiPedia:...]] and [[Dmoz:...]] Links at the end. Categories are missused for that.
        self.categories_last = self.alphabetic

        # Add names for script control pages here, one for each language.
        self.genbotrunpage = {
            '_default': u'Wikivoyage:Script policy/Run',
            'en'  : u'Wikivoyage:Script policy/Run',
            'de'  : u'Wikivoyage:Regeln fÃŒr Skripte/Run',
            'fr'  : u'Wikivoyage:RÚgles concernant les scripts/Run',
        }


    # A few selected big languages for things that we do not want to loop over
    # all languages. This is only needed by the titletranslate.py module, so
    # if you carefully avoid the options, you could get away without these
    # for another wikimedia family.

    biglangs = ['en','fr','ro']

    def hostname(self,code):
        return 'wikivoyage.org'

    def path(self, code):
        return '/wiki/%s/index.php' % code

    def version(self, code):
        return "1.4.11"



    # Treat [[WikiPedia:...]] and [[Dmoz:...]] links as categories.
    def category_namespaces(self, code):
        namespaces = family.Family.category_namespaces(self, code)
        namespaces.append('WikiPedia')
        namespaces.append('Dmoz')
        return namespaces

File "user-config.py"

[edit]

Create a file user-config.py with the following contence in the pywikipedia directory.

mylang = 'en'
family = 'wikivoyage'
usernames['wikivoyage']['de'] = 'InterLangBot'
usernames['wikivoyage']['en'] = 'InterLangBot'
# Comment out other languages if you want to edit pages in this versions.
# usernames['wikivoyage']['fr'] = 'InterLangBot'
# usernames['wikivoyage']['ro'] = 'InterLangBot'
# usernames['wikivoyage']['sv'] = 'InterLangBot'
# usernames['wikivoyage']['ja'] = 'InterLangBot'

NOTE: This config file is for user InterLangBot, but you can savely use an other user account for which you know the password.

Using the Bot

[edit]

First of all, you should login at all language versions you want the bot to edit pages. Type

 # python login.py -all

You are asked for InterLangBot's passwords for each language version.

After once being loged in, you can start the actual bot:

 # python interwiki.py -autonomous -start:!

This will read the list of regular articles on the wiki declared by the mylang tag in the above user-config.dat file. The interwiki links of each article are read and followed. If the bot finds new languages for the same subject, the links are added to all language versions you have loged in before. Watch the bot and hope it will behave kindly. See http://meta.wikimedia.org/wiki/Interwiki.py for more info.

-- (WT-en) Hansm 12:20, 18 Oct 2005 (EDT)

Good work! Is this running right now? Would it be useful to run it on the wikivoyage.org server on a regular (daily?) basis? --(WT-en) Evan 11:07, 19 Oct 2005 (EDT)
Hi Evan, yes, I have run this bot during the last days. It has made some 100's of links to de:, fr: and ja:. But I felt like watching the bot's work closely. At the end, I can say that it seems to work well. For sure, it would save a lot of bandwith if running directly on the wikivoyage.org server. But if running it as a cron job, there were one very important point to care about: After doing an upgrade of the MediaWiki software, the bot needed to be well observed. There might always be some slight changes in the wiki server's behaviour that easyly could knock out the bot. As far as I have studied it's code, it would probably not turn into a spam monster, but it's well possible that it bails out.
To my big disappointment, I had to realize that already the most recent CVS version of Pywikipedia does not run properly against wikivoyage.org with my patches. There seems to be some "improvement" that relies on some behaviour of the server that wikivoyage.org doesn't have. Another painfull debuging session raises at the horizon.
Summary: If applying the above patches on the packaged version of Pywikipedia form 2005-08-25, the bot will work well and could be run as cron job, as long as you do not upgrade the MediaWiki software. Since the bot runs for about 5 to 8 hours on each language version (de:, fr:, ja:), a total run over each of the 6 languages (including the big en:) would need about 2 days. So, a weekly or monthly cron job would be reasonable.
-- (WT-en) Hansm 04:08, 21 Oct 2005 (EDT)


Bad Reichenhall

[edit]

Ca nyou do Bad Reichenhall in German for me? (WT-en) Kingjeff 14:49, 3 Jan 2006 (EST)