Import XML Dumps to Your MediaWiki Wiki



MediaWiki uses an abstract Validate XML based format for content dumps. This is what Special:Export generates, and also what is used for XML dumps of Wikipedia and other Wikimedia sites. This can be imported into another wiki using MediaWiki via the Special:Import page or by using mwdumper or xml2sql.

You may find yourself needing to import XML dumps into your Wiki one time or another. Here are some commonly used methods for importing XML Dumps.

Steps

  1. Special:Import is a feature in MediaWiki software that can be used by Sysops (per default) to import a small number of pages (i.e. anything below 20MB should be safe). Trying to import large dumps this way may result in timeouts or connection failures. There are two reasons that this happens.
    • The PHP upload limit found in PHP configuration file php.ini
    • And also the hidden variable limiting the size in the input form. Found in the mediawiki source code, includes/SpecialImport.php


    You could decrease the limit by adding this in php.ini:
  2. If you have shell access,you could try using importDump.php. Though it is the most recommended method, it gets slow when importing huge dumps. If you are trying to import something as huge as Wikipedia dumps, use mwdumper. importDump.php is a command line script located in the maintenance folder of your MediaWiki installation. If you have shell access, you can use importdump.php with this command.


    where <dumpfile> is the name of your dump file. Even if the file is compressed in .bz2 or .gz file extension it gets decompressed automatically.
  3. For large database sets, try using mwdumper . It is a Java application that is capable of reading, writing and converting MediaWiki XML dumps to SQL dump (for later use with mysql or phpmyadmin) which can then be imported into the database directly. It is much faster than importDump.php, however, it only imports the revisions (page contents), and does not update the internal link tables accordingly -- that means that category pages and many special pages will show incomplete or incorrect information unless you update those tables.
    • If available, you can fill the link tables by importing separate SQL dumps of these tables using the mysql command line client directly. For Wikimedia wikis (including Wikipedia) this is provided along with the XML dumps. Otherwise, you can run rebuildall.php, which will take a long time, because it has to parse all pages. This is not recommended for large data sets.
  4. Xml2sql is another XML to SQL converter similar to mwdumper but it is not an official tool and is not maintained by MediaWiki developers. It is a multi-platform ANSI C program and importing via this may be fast, but does not update secondary data like link tables, so you need to run rebuildall.php, which nullifies that advantage.

Tips

  • To run importDump.php (or any other tool from the maintenance directory), you need to set up your Admin Settings.php file. For those who use mediawiki version >= 1.16, the restriction is irrelevant, just ignore it.
  • Be warned that Xml2sql may be incompatible with the latest version of MediaWiki.

Warnings

  • Running importDump.php can take quite a long time. For a large Wikipedia dump with millions of pages, it may take days, even on a fast server.
    • Similarly, using Special:Import may not be recommended for large data-sets.

Related Articles

Sources and Citations

You may like