-
Notifications
You must be signed in to change notification settings - Fork 149
TaskForce
Here, we coordinate us to download all non WikiFarms wikis.
We start with a list from Andrew Pavlo, which is mirrored here. After discarding dead wikis, we have about 7,400+ alive wikis.
You need GNU/Linux, Python and p7zip-full (sudo apt-get install p7zip-full).
Download a list (every list has 100 wikis):
* Choose one from [https://github.com/WikiTeam/wikiteam/tree/master/batchdownload 000 to 073]
EDIT: the files taskforce/list000 to taskforce/list073 appears to have been replaced by taskforce/mediawikis_done_2014 . were they all archived up to 2014, then?
And use the following scripts to backup the wikis:
* [https://raw.githubusercontent.com/WikiTeam/wikiteam/master/launcher.py launcher.py] * [https://raw.githubusercontent.com/WikiTeam/wikiteam/master/dumpgenerator.py dumpgenerator.py]
After downloading a list and both scripts in the same directory, you can do: python launcher.py mylist.txt
It will backup every wiki and generate two 7z files for each one: domain-date-history.xml.7z (just wikitext) and domain-date-wikidump.7z (wikitext and images/ directory).
See also the Tutorial page for details on how the scripts work.
Note: we recommend to split every 100 wikis list into 10 lists of 10. You can do that with the split command like this: split -l10 list000 list000-
Lists claimed by users. Notify us when you start downloading a list of wikis. If you can't edit this page, email us in our mailing list (you have to join).
List | Member | Status | Notes |
---|---|---|---|
000 | emijrp | Downloading | Downloaded: X. Incomplete: X. Errors: X. |
001 | ScottDB | Downloaded | ... |
002 | underscor | Downloading | ... |
003 | Nemo | Downloaded | Final check. Including citywiki.ugr.es that is archived. *All wikis done* except 6 wikis which can't finish download and biosite still being downloaded. *See logs*. |
004 | ianmcorvidae | Downloading | Downloaded: 61; variety of errors, mostly index.php missing |
005 | Nemo | Downloaded | Final check; 1 wiki being redownloaded |
006 | Nemo | Downloaded | Final check |
007 | Nemo | Downloaded | Final check |
008 | Nemo | Downloaded | Final check |
009 | Nemo | Downloaded | Final check |
010 | Nemo | Downloaded | Final check (katlas.math.toronto.edu has 1.6 million pages, finished as well) |
011 | mutoso | Downloaded | Final check |
012 | Nemo | Downloaded | Final check |
013 | Nemo | Downloaded | See above. |
014 | Nemo | Downloaded | Final check |
015 | Nemo | Downloaded | Final check |
016 | Nemo | Downloaded | Final check; 1 wiki being redownloaded |
017 | Nemo | Downloaded | Final check |
018 | Nemo | Downloaded | Final check |
019 | Nemo | Downloaded | See above. |
020 | mutoso | ... | |
021 | Nemo | Downloaded | Final check |
022 | Nemo | Downloaded | Final check; 1 wiki being redownloaded |
023 | Nemo | Downloaded | See above. |
024 | Nemo | Downloaded | Final check |
025 | Nemo | Downloaded | See above. |
026 | Nemo | Downloaded | See above. |
027 | Nemo | Downloaded | See above. |
028 | Nemo | Downloaded | See above. |
029 | Nemo | Downloaded | See above. |
030 | Nemo | Downloaded | See above. |
031 | Nemo | Downloaded | See above. |
032 | Nemo | Downloaded | See above. |
033 | Nemo | Downloaded | See above. |
034 | Nemo | Downloaded | See above. |
035 | Nemo | Downloaded | See above. |
036 | Nemo | Downloaded | Except a single wiki still being downloaded |
037 | Nemo | Downloaded | See above. |
038 | Nemo | Downloaded | See above. |
039 | Nemo | Downloaded | See above. |
040 | Nemo | Downloaded | See above. |
041 | Nemo | Downloaded | See above. |
042 | Nemo | Downloaded | See above. |
043 | Nemo | Downloaded | See above. |
044 | Nemo | Downloaded | See above. |
045 | Nemo | Downloaded | See above. |
046 | Nemo | Downloaded | See above. |
047 | Nemo | Downloaded | See above. |
048 | Nemo | Downloaded | See above. |
049 | Nemo | Downloaded | See above. |
050 | Nemo | Downloaded | See above. |
051 | Nemo | Downloaded | See above. |
052 | Nemo | Downloaded | See above. |
053 | Nemo | Downloaded | See above. |
054 | Nemo | Downloaded | See above. |
055 | Nemo | Downloaded | See above. |
056 | Nemo | Downloaded | See above. |
057 | Nemo | Downloaded | See above. |
058 | Nemo | Downloaded | See above. |
059 | Nemo | Downloaded | See above. |
060 | Nemo | Downloaded | See above. |
061 | Nemo | Downloaded | See above. |
062 | Nemo | Downloaded | See above. |
063 | Nemo | Downloaded | See above. |
064 | Nemo | Downloaded | See above. |
065 | Nemo | Downloaded | See above. |
066 | underscor | Downloading | ... |
067 | underscor | Downloading | ... |
068 | underscor | Downloading | ... |
069 | underscor | Downloading | ... |
070 | underscor | Downloading | ... |
071 | underscor | Downloading | ... |
072 | underscor | Downloading | ... |
073 | underscor | Downloading | ... |
We are uploading the dumps to the WikiTeam Collection at Internet Archive.
When you have finished a list of wikis and the script 7ziped them, you can proceed to upload the dumps.
You need to download uploader.py in the same directory you have the .7z dumps, dumpgenerator.py and the list.txt.
You will need Internet Archive S3-keys associated to your user account. Generate and save them both in a file named keys.txt (first line for access key, second line for secret key).
To execute, do this: python uploader.py mylist.txt
It generates a log file which you must preserve to avoid reuploading the same dumps if you re-run the upload script.
Welcome to the WikiTeam documentation wiki! We are a group dedicated to archiving wikis around the Internet, and you are invited to be part of it! Find out more.
- Main Page
- News
- Tutorial
- Developers docs
- FAQ
- Software
- Collections
- Community
- Research
- SpeedyDeletion
- WikiFarms