Skip to content
xloem edited this page Jan 20, 2021 · 2 revisions

Here, we coordinate us to download all non WikiFarms wikis.

We start with a list from Andrew Pavlo, which is mirrored here. After discarding dead wikis, we have about 7,400+ alive wikis.

Table of Contents

Requirements

You need GNU/Linux, Python and p7zip-full (sudo apt-get install p7zip-full).

How to join the effort?

Download a list (every list has 100 wikis):

  * Choose one from [https://github.com/WikiTeam/wikiteam/tree/master/batchdownload 000 to 073]

EDIT: the files taskforce/list000 to taskforce/list073 appears to have been replaced by taskforce/mediawikis_done_2014 . were they all archived up to 2014, then?

And use the following scripts to backup the wikis:

  * [https://raw.githubusercontent.com/WikiTeam/wikiteam/master/launcher.py launcher.py]
  * [https://raw.githubusercontent.com/WikiTeam/wikiteam/master/dumpgenerator.py dumpgenerator.py]

After downloading a list and both scripts in the same directory, you can do: python launcher.py mylist.txt

It will backup every wiki and generate two 7z files for each one: domain-date-history.xml.7z (just wikitext) and domain-date-wikidump.7z (wikitext and images/ directory).

See also the Tutorial page for details on how the scripts work.

Note: we recommend to split every 100 wikis list into 10 lists of 10. You can do that with the split command like this: split -l10 list000 list000-

Volunteers table

Lists claimed by users. Notify us when you start downloading a list of wikis. If you can't edit this page, email us in our mailing list (you have to join).

List Member Status Notes
000 emijrp Downloading Downloaded: X. Incomplete: X. Errors: X.
001 ScottDB Downloaded ...
002 underscor Downloading ...
003 Nemo Downloaded Final check. Including citywiki.ugr.es that is archived. *All wikis done* except 6 wikis which can't finish download and biosite still being downloaded. *See logs*.
004 ianmcorvidae Downloading Downloaded: 61; variety of errors, mostly index.php missing
005 Nemo Downloaded Final check; 1 wiki being redownloaded
006 Nemo Downloaded Final check
007 Nemo Downloaded Final check
008 Nemo Downloaded Final check
009 Nemo Downloaded Final check
010 Nemo Downloaded Final check (katlas.math.toronto.edu has 1.6 million pages, finished as well)
011 mutoso Downloaded Final check
012 Nemo Downloaded Final check
013 Nemo Downloaded See above.
014 Nemo Downloaded Final check
015 Nemo Downloaded Final check
016 Nemo Downloaded Final check; 1 wiki being redownloaded
017 Nemo Downloaded Final check
018 Nemo Downloaded Final check
019 Nemo Downloaded See above.
020 mutoso ...
021 Nemo Downloaded Final check
022 Nemo Downloaded Final check; 1 wiki being redownloaded
023 Nemo Downloaded See above.
024 Nemo Downloaded Final check
025 Nemo Downloaded See above.
026 Nemo Downloaded See above.
027 Nemo Downloaded See above.
028 Nemo Downloaded See above.
029 Nemo Downloaded See above.
030 Nemo Downloaded See above.
031 Nemo Downloaded See above.
032 Nemo Downloaded See above.
033 Nemo Downloaded See above.
034 Nemo Downloaded See above.
035 Nemo Downloaded See above.
036 Nemo Downloaded Except a single wiki still being downloaded
037 Nemo Downloaded See above.
038 Nemo Downloaded See above.
039 Nemo Downloaded See above.
040 Nemo Downloaded See above.
041 Nemo Downloaded See above.
042 Nemo Downloaded See above.
043 Nemo Downloaded See above.
044 Nemo Downloaded See above.
045 Nemo Downloaded See above.
046 Nemo Downloaded See above.
047 Nemo Downloaded See above.
048 Nemo Downloaded See above.
049 Nemo Downloaded See above.
050 Nemo Downloaded See above.
051 Nemo Downloaded See above.
052 Nemo Downloaded See above.
053 Nemo Downloaded See above.
054 Nemo Downloaded See above.
055 Nemo Downloaded See above.
056 Nemo Downloaded See above.
057 Nemo Downloaded See above.
058 Nemo Downloaded See above.
059 Nemo Downloaded See above.
060 Nemo Downloaded See above.
061 Nemo Downloaded See above.
062 Nemo Downloaded See above.
063 Nemo Downloaded See above.
064 Nemo Downloaded See above.
065 Nemo Downloaded See above.
066 underscor Downloading ...
067 underscor Downloading ...
068 underscor Downloading ...
069 underscor Downloading ...
070 underscor Downloading ...
071 underscor Downloading ...
072 underscor Downloading ...
073 underscor Downloading ...

Where/How to upload the dumps?

We are uploading the dumps to the WikiTeam Collection at Internet Archive.

When you have finished a list of wikis and the script 7ziped them, you can proceed to upload the dumps.

You need to download uploader.py in the same directory you have the .7z dumps, dumpgenerator.py and the list.txt.

You will need Internet Archive S3-keys associated to your user account. Generate and save them both in a file named keys.txt (first line for access key, second line for secret key).

To execute, do this: python uploader.py mylist.txt

It generates a log file which you must preserve to avoid reuploading the same dumps if you re-run the upload script.

Welcome to the WikiTeam documentation wiki! We are a group dedicated to archiving wikis around the Internet, and you are invited to be part of it! Find out more.


Clone this wiki locally