RegioWikiCamp 2011/Sessions/Wiki backups
From Regiowiki
This talk was given in #rwc11 channel on Freenode, Sept 3, 2011. The talk is focused on wiki backups and the using of WikiTeam tools (Tutorials)
You can use the talk page for any questions. Also, contact me at emijrp@gmail.com if you need help when making backups for your wiki.
- 14:16:05 <emijrp> hi all
- 14:16:22 <emijrp> is this being shown in the projector?
- 14:16:29 <friedel> yes it is
- 14:16:36 <emijrp> can we start?
- 14:17:12 <emijrp> im opening my notes
- 14:17:12 <friedel> we sit in a small conference room and see all whats written
- 14:17:37 <emijrp> ok
- 14:17:55 <emijrp> Hi all. I'm glad I can give a talk in RegioWikiCamp 2011. My name is Emilio, although I'm registered in most wikis as emijrp. I'm from Spain, and I started to collaborate to Spanish Wikipedia in 2005.
- 14:18:22 <emijrp> I like tech and skilled in programming, so I have developed some tools for Wikipedia (maintenance task bots, an antivandalism bot, etc). That was some years ago, now, I'm focused in developing tools to generate statistics and tools to backup wikis.
- 14:19:21 <emijrp> This talk is about making backups of wikis. When I finish I hope you understand the importance of backups and you have some tips and tools to make them. The talk is going to last about 10 or 15 minutes. If you have any questions, friedel can write them.
- 14:19:45 <friedel> we are not programmers but wiki and linux admins
- 14:20:10 <emijrp> OK. That is nice.
- 14:20:33 <emijrp> If you want to create a wiki community, you have mainly two ways: 1) paying a server and installing a wiki engine (MediaWiki, or similar), or 2) signing up in a wikifarm. When you wiki is up, you work on it, other people join to your community, your wiki flourishes and all is great.
- 14:21:12 <emijrp> But your data is on metal plates spinning thousand times per minute, during days, weeks, months. And hard disks fail. Also, natural disasters happen earthquakes, floods, fires).
- 14:21:57 <emijrp> I'm talking about hard disks, you know ; )
- 14:22:08 <emijrp> All you work may vanishes. In a second. Forever.
- 14:22:32 <emijrp> Don't you believe it? ScribbleWiki.com was a wikifarm that hosted hundred of wikis. It lost all wikis due to a server crash, and they weren't able to recover the data from backups. Furthermore, no users had backups, because wikifarms don't offer public backups in easy ways.
- 14:23:03 <emijrp> Did you know about this case?
- 14:23:30 <friedel> nope
- 14:23:57 <friedel> but we are aware of the problem
- 14:24:18 <emijrp> OK, you can read more about the history of that former wikifarm here http://wikiindex.org/ScribbleWiki I leave the link if you want to read more about it
- 14:24:24 <friedel> and talking about other online services that shut down because of server crash
- 14:25:01 <emijrp> Yes. You never can trust others when we talk about your data or your wiki.
- 14:25:30 <emijrp> I guess many of you are frequent users of wikis, and probably you own a wiki.
- 14:26:03 <emijrp> MediaWiki has two tools for making backups, dumpBackup.php (http://www.mediawiki.org/wiki/Manual:DumpBackup.php) and the Special:Export page (http://www.yourwiki.com/wiki/Special:Export).
- 14:26:54 <emijrp> Did you hear about them? Do you use them frequently?
- 14:27:27 <emijrp> How many of your wikis have backups? Do you offer public backups for your users?
- 14:28:03 <friedel> we don't use them, i know them, i do mysql dumps + tar of the wiki directory
- 14:28:29 <emijrp> I will paste this entire conversation later on the RegioWiki wiki, so, don't worry about taking notes. Reply to the questions when you can.
- 14:29:06 <friedel> we don't provide public backups, we do daily backups on the server that will be downloaded once a week
- 14:29:47 <emijrp> All the people at the room follow than "guidelines"?
- 14:29:50 <friedel> thank you for pasting (so i don't have to do so ;)
- 14:30:00 <emijrp> that*
- 14:30:27 <friedel> we don't have guidelines.. we all do it as we think it's best
- 14:30:50 <friedel> "we like to live dangerous..." ;)
- 14:31:18 <emijrp> OK. I'm going to describe a bit about that two tools, and later speak about a new tool.
- 14:31:30 <friedel> ok
- 14:32:03 <emijrp> dumpBackup.php (http://www.mediawiki.org/wiki/Manual:DumpBackup.php) is a script to be runned on console mode and you need server access. The script can backup all your articles and pages in a single batch. But, you can only run dumpBackup.php if you own the server or pay for it, having shell access, and have some UNIX (operating system) skills. So it is not a valid choice for people using wikifarms or with minor tech skills.
- 14:32:46 <emijrp> And the second one...
- 14:33:00 <emijrp> Special:Export (http://www.yourwiki.com/wiki/Special:Export) is a backup interface in the MediaWiki software for making backups of single pages or batch pages in a category. As you can see, this option is only useful when your wiki is tiny. By the way, images are not backed up with this tool.
- 14:33:22 <emijrp> How large are your wikis?
- 14:33:56 <emijrp> Number of pages and images.
- 14:34:02 <emijrp> Approx.
- 14:34:03 <friedel> my projects has >5000 articles + >3000 pictures
- 14:34:26 <friedel> wiki-brest >3000 articels + >7000 pictures
- 14:34:53 <emijrp> Nice. You have successful wikis : D
- 14:34:55 <friedel> tuepedia >2000 articles, > 1500 pictures + 3 soundfiles
- 14:35:27 <emijrp> So, we have dumpBackup.php only for skilled people, and Special:Export only for making backups of a few pages and without images. It is not a good scenario.
- 14:35:55 <emijrp> Since some months ago I'm worried about the short life of websites. Every day, thousands of websites are lost forever (admins don't renew domains, server crashes, crackers destroy your data and you have no backups, disasters, etc).
- 14:36:08 <friedel> did you contribute to those mediawiki scripts?
- 14:36:40 <emijrp> No. That scripts are native of MediaWiki. I have develop a new tool for backups, but it is not included inside MediaWiki.
- 14:37:05 <emijrp> The tool is free, and GPL.
- 14:37:15 <emijrp> : )
- 14:37:46 <emijrp> I guess you know a bit about licenses, right? Wikipeople usually know.
- 14:38:17 <frogpond> yep, CC and all ;)
- 14:38:37 <emijrp> Nice.
- 14:38:48 <emijrp> More people before me have worried about dissapearing websites. For example, Internet Archive (http://www.archive.org) is a website that saves snapshots of a huge number of sites from time to time. You can see for example how was Google in 1998 http://web.archive.org/web/19981202230410/http://www.google.com/
- 14:39:30 <emijrp> But doing snapshots of websites is OK only for static HTML sites. Wikis usually change quickly, they have histories for every page, and wikitext is different to the generated HTML. Wikis are not saved correctly by web crawlers.
- 14:40:28 <emijrp> Web crawlers are the programs used by Google for Google Cache, by Internet Archive for their Wayback Machine (the 1998 Google link), etc.
- 14:40:50 <emijrp> So, when I saw this, I thought I have to do something to solve this issue. And I started to develop the WikiTeam tools http://code.google.com/p/wikiteam/
- 14:41:27 <emijrp> When I paste a link, you can open it, I leave a few seconds for you watch them.
- 14:42:15 <emijrp> As you can see at the top of the site, WikiTeam in a nutshell:
- 14:42:48 <emijrp> There are thousands of wikis in the Internet. Everyday some of them are no longer publicly available and, due to lack of backups, lost. Many people download tons of media files (music, books, etc) from the Internet, implementing a kind of distributed backup. Wikis, most of them under free licenses, disappear from time to time because nobody grabbed a copy of them. That is a shame that we would like to solve.
- 14:43:18 <emijrp> Hi frogpond, I guess you are on the room with friedel?
- 14:44:11 <emijrp> WikiTeam tools are scripts that save all your MediaWiki wiki pages and all the images and *you don't need server access!* So, you can save your own wiki in a paid server, your own wiki in a wikifarm or whatever wiki you want (which you own or not). This is a great step forward.
- 14:44:50 <emijrp> (A note: WikiTeam tools only works by now in MediaWiki wikis which is the most used wiki engine in the world, but we want to add support to other wiki engines)
- 14:45:27 <emijrp> WikiTeam tools generates an XML file with all your page histories. And a directory with all your images.
- 14:45:55 <emijrp> The code is available here (http://code.google.com/p/wikiteam/source/browse/trunk) And you can see all the wikis we have backed up (http://code.google.com/p/wikiteam/downloads/list)
- 14:46:26 <emijrp> Later we are going to do some test, and I will do a demo. So, don't worry about the code now.
- 14:46:51 <emijrp> Since April 2011, we have backed up more than 100 wikis.
- 14:47:33 <friedel> question: can you backup every mediawiki completely from remote?
- 14:48:03 <emijrp> If it is public, I mean it is reachable from a PC with Internet, yes.
- 14:48:23 <emijrp> Private wikis in local networks no.
- 14:48:33 <friedel> does it have to be configured with API on?
- 14:49:02 <emijrp> Yes, it is better if API is enabled. But it works without API too.
- 14:49:09 <emijrp> I recommend using API.
- 14:49:54 <emijrp> There are some problems when making backups with old MediaWiki versions.
- 14:50:14 <emijrp> The current version of MediaWiki is 1.18 or 1.19.
- 14:51:06 <emijrp> I mean, problems may appears with very old versions of MediaWiki, 1.10, 1.9 or so, because they don't have API.
- 14:51:23 <emijrp> What MediaWiki versions do you use?
- 14:52:00 <emijrp> You can see it in the special page Special:Version
- 14:52:12 <emijrp> Your wiki friedel for example?
- 14:52:14 <friedel> 1.16.*
- 14:52:20 <emijrp> Nice.
- 14:52:24 <friedel> all of us
- 14:52:42 <emijrp> Since WikiTeam was founded, we have developed other tools, like a Wikipedia backups downloader (which are huge but worth saving).
- 14:53:19 <friedel> in what language your scripts are written
- 14:53:31 <friedel> ?
- 14:53:59 <emijrp> English.
- 14:54:02 <friedel> (what interpreter do we need?)
- 14:54:05 <emijrp> Python.
- 14:54:12 <friedel> php/python/perl??
- 14:54:15 <friedel> ok
- 14:54:29 <friedel> version?
- 14:54:42 <emijrp> Do you use Linux or Windows? I have tested the script on Linux and it is OK. Windows must be OK, but I have not tested it there.
- 14:54:59 <emijrp> Python 2.7
- 14:55:17 <friedel> thank you
- 14:56:00 <emijrp> Also, you can subscribe to the mailing list here (http://groups.google.com/group/wikiteam-discuss). People post questions, other reply, and we get good ideas for developing new features.
- 14:56:24 <emijrp> There is low traffic, so, it is not to flood your mail account.
- 14:56:39 <emijrp> If you have any further question, please, ask. I will try to reply. When can leave some minutes for questions and then do some tests with WikiTeam tools.
- 14:57:12 <emijrp> Are you using Windows or Linux ?
- 14:57:23 <friedel> q: how much work is it to recreate a wiki from a backup made with your tools?
- 14:57:39 * friedel is using linux
- 14:57:47 <friedel> most of us linux + some windows
- 15:01:11 <emijrp_> Sorry, my internet connection failed.
- 15:01:17 <emijrp_> Paste your last lines.
- 15:01:23 <friedel> most of us linux + some windows
- 15:01:26 <friedel> q: how much work is it to recreate a wiki from a backup made with your tools?
- 15:01:31 <emijrp_> OK
- 15:02:00 <emijrp_> You have to install a blank MediaWiki. UPload the XML file and the packed images. Import the XML with importDump.php, and unpack the images directory.
- 15:02:14 <emijrp_> Of course, you will need to reinstall all your extensions and so on.
- 15:02:18 <friedel> ok
- 15:02:45 <emijrp_> Recreate a wiki may be a bit difficult or a newbie. But the important point here is that you have the data saved.
- 15:03:03 <emijrp_> Without the data, you can't restore it : ).
- 15:03:18 <friedel> do you plan an mw extension to access it's directory + extensions?
- 15:03:42 <emijrp_> I dont understand you.
- 15:04:20 <emijrp_> WikiTeam tools saves the XML with the pages, and the images.
- 15:04:31 <friedel> we'd like to know if its possible to backup localsettings.php + extensions-directory
- 15:04:49 <emijrp_> No. You need to have server access for that.
- 15:05:00 <emijrp_> You can't do it remotely for any wiki.
- 15:05:36 <emijrp_> Do you use many extensions?
- 15:05:37 <friedel> do you save history of articles?
- 15:05:42 <emijrp_> Yes. Full history.
- 15:06:06 <emijrp_> Although there is an option to save only the last revision, if you dont care about all the data.
- 15:06:07 <friedel> quite a lot... 8 to 15
- 15:06:22 <friedel> ^ extensions
- 15:06:23 <emijrp_> And how did you installed them?
- 15:07:03 <friedel> as you do.. tar xvzf.. +entry in localsettings.php
- 15:07:05 <emijrp_> I mean, extensions code are in MediaWiki repository, so, you dont have to save them.
- 15:07:17 <emijrp_> The code of your extensions will always be available.
- 15:07:29 <friedel> ok.
- 15:07:50 <emijrp_> We have to be worried about the data that is unique: your articles and your images.
- 15:08:32 <emijrp_> Although if you can save the Extensions/ directory, it is better, and you can re-install your wiki easily.
- 15:09:06 <friedel> ok. can you show us now the usage of wikiTools?
- 15:09:18 <emijrp_> Sure : D
- 15:09:34 <emijrp_> OK, we can start with a demo. I'm going to work on Ubuntu Linux. If you use Windows, you can try it too, but I have no tested in that operating system. If you get any error, please stop me.
- 15:09:53 <emijrp_> We need to install the Python interpreter. In Ubuntu, just type: sudo apt-get install python. In Windows you have to download and install this http://python.org/ftp/python/2.7.2/python-2.7.2.msi
- 15:10:11 <emijrp_> Who is following my steps? Please, tell me when you finish.
- 15:11:06 <friedel> emilio ... this is friogpond typing ....
- 15:11:18 <emijrp_> Hi
- 15:11:38 <friedel> thank you very much so far ... but we're out of time and we need to run to get the boat
- 15:11:42 <friedel> ;(
- 15:12:05 <emijrp_> OK, I can paste the rest of the talk in the wiki.
- 15:12:10 <emijrp_> All the people leave?
- 15:12:31 <friedel> this would be really cool - thank you
- 15:12:39 <friedel> oh here's Friedel again
- 15:13:28 <emijrp_> So, all the people leave, or some have more time to read?
- 15:13:48 <emijrp_> If you have to leave, no problem, we can talk later or tomorrow.
- 15:13:51 <friedel> two person s are reading the rest will leave
- 15:14:02 <friedel> ok. thank you very much :)
- 15:14:22 <friedel> and please paste the log to the wiki.regiowiki.eu
- 15:14:28 <emijrp_> The people who leave, are going to return tomorrow?
- 15:15:32 <friedel> not all
- 15:15:35 <friedel> ...
- 15:15:41 <emijrp_> OK, no problem.
- 15:16:06 <emijrp_> Have you installed Python? What are you using Linux or Windows?
- 15:16:17 <friedel> i use linux
- 15:16:34 <emijrp_> OK, if you have Python installed, open a console.
- 15:17:01 <emijrp_> But first, we have to download the WikiTeam tools.
- 15:17:09 <emijrp_> The file is here http://wikiteam.googlecode.com/svn/trunk/dumpgenerator.py Right click, save as... dumpgenerator.py Save it in your desktop or any other directory path, but remember it!
- 15:17:23 <friedel> phyton version 2.5 (ubuntu 8.04 lts)
- 15:17:30 <emijrp_> I think is ok.
- 15:18:05 <emijrp_> As we are on RegioWikiCamp we can try to backup this wiki http://wiki.regiowiki.eu ; )
- 15:18:30 <emijrp_> Linux users have to open a console. Windows users have to click on Start menu -> Execute command -> cmd.exe (press enter) -> A black window appears
- 15:19:01 <emijrp_> Move you to the directory where you saved dumpgenerator.py. Use the command "cd" to change directory.
- 15:19:27 <emijrp_> When you are on the directory with dumpgenerator.py, write this.
- 15:19:33 <emijrp_> python dumpgenerator.py --api=http://wiki.regiowiki.eu/api.php --index=http://wiki.regiowiki.eu/index.php5 --xml --images
- 15:20:25 <emijrp_> I have tested it, and it is OK. The backup is generated in a directory called wikiregiowikieu-...
- 15:20:34 <emijrp_> Try it. I come back in a 10 minutes.