ArchiveBox is a self-hosted and powerful internet archiving solution written in Python. It enables one to collect, save and view sites you want to save offline. ArchiveBox can be set as a command-line tool, desktop app, or accessed via the web. This is a cross-platform tool available for Linux, macOS, and Windows systems.
Below are the cool features for ArchiveBox.
- It allows one to feed it URLs one at a time, or schedule regular imports from your browser’s bookmarks, history, feeds e.t.c
- It saves snapshots of the URLs you feed it in several formats: HTML, PDF, PNG screenshots, WAR e.t.c
In this guide, we will walk through how to install and configure and use ArchiveBox self-hosted internet archiving solution.
Install ArchiveBox self-hosted internet archiving solution
There are several methods you can use to install ArchiveBox self-hosted internet archiving solution.
- Using PIP3
- Using Docker
#1. Install ArchiveBox using Pip3
For this method, ensure that you have Python 3.7 and above, and Node version 12 and above installed on your system. Then install PIP on your system.
##On Debian/Ubuntu
sudo apt install python3-pip
##On RHEL/CentOS/Rocky Linux 8
sudo yum install epel-release
sudo yum install python3-pip
##On openSUSE
sudo zypper install python3-pip
##On Arch Linux
sudo pacman -S python-pip
With PIP3 installed, you can install ArchiveBox as below.
sudo pip3 install archivebox
Initialize ArchiveBox as below.
mkdir ~/archivebox && cd ~/archivebox
archivebox init --setup
Start the ArchiveBox webserver.
archivebox server 0.0.0.0:8000
This method has a lot of dependency problems and is thus not suitable.
#2. Install ArchiveBox using Docker-Compose(Recommended)
Begin by installing docker on Linux using the aid below.
Start and enable docker
sudo systemctl enable docker
sudo systemctl start docker
Install docker-compose.
curl -s https://api.github.com/repos/docker/compose/releases/latest | grep browser_download_url | grep docker-compose-linux-x86_64 | cut -d '"' -f 4 | wget -qi -
chmod +x docker-compose-linux-x86_64
sudo mv docker-compose-linux-x86_64 /usr/local/bin/docker-compose
Add your user to the docker group.
sudo usermod -aG docker $USER
newgrp docker
Download the docker-compose YAML file
curl -O 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/docker-compose.yml'
Start the ArchiveBox server.
docker-compose run archivebox init --setup
Proceed as below.
[√] Done. A new ArchiveBox collection was initialized (0 links).
[+] Creating new admin user for the Web UI...
Username (leave blank to use 'archivebox'): admin
Email address: [email protected]
Password: Enter your Password
Password (again): Enter the Password again
Start the container.
$ docker-compose up
The server is now up and running.
[+] Running 1/1
⠿ Container thor-archivebox-1 Created 0.3s
Attaching to thor-archivebox-1
thor-archivebox-1 | [i] [2021-12-20 09:32:05] ArchiveBox v0.6.2: archivebox server --quick-init 0.0.0.0:8000
thor-archivebox-1 | > /data
thor-archivebox-1 |
thor-archivebox-1 | [^] Verifying and updating existing ArchiveBox collection to v0.6.2...
.......
Access the webpage at 0.0.0.0:8000
Use ArchiveBox self-hosted internet archiving solution
Once installed, you are set to start using ArchiveBox on your system to take a backup of sites you want to save offline.
You can add a URL to save as below.
$ archivebox add 'https://example.com'
Using docker-compose.
$ docker-compose run archivebox add 'https://example.com'
Sample output:
To schedule automatic adding of URLs use the command:
$ archivebox schedule --every=day --depth=1 https://example.com/rss.xml
On Docker-compose:
$ docker-compose run archivebox schedule --every=day --depth=1 https://example.com/rss.xml
View Archived pages.
On ArchiveBox, you can view the saved pages using the CLI or the web as below.
Using the CLI, view archived pages:
$ archivebox list 'https://example.com'
Accessing and Using ArchiveBox Web UI
From the web page, view the archived pages using the URL http://IP_Address:8000
Add more pages and manage ArchiveBox by clicking on the + icon. provide login credentials to proceed.
On this ArchiveBox admin dashboard, you can manage users, accounts, snapshots e.t.c
Add a URL by clicking on Add + as shown above. Provide the list of URLs to archive.
Scroll to the bottom of the page and add the URLs. The URLs will be added as below.
View the list of added URLs by navigating to the home page as shown.
You can view what is archived by clicking on the snapshot.
That is it!
I hope you enjoyed this guide on how to install and use ArchiveBox self-hosted internet archiving solution.
Interested in more?