TOS Validator node maintenance and security
Introduction
This guide provides some basic information on maintaining and securing TOS Validator nodes.
This document assumes that a validator is installed using the configuration and tools recommended by TOS DAO but general concepts apply to other scenarios as well and can be useful for savvy sysadmins.
Maintenance
Database grooming
TOS Node/Validator keeps it's database within the path specified by --db
flag of validator-engine
, usually /var/tos-work/db
, this directory is created and managed by the node but it is recommended to perform a database grooming/cleanup task once a month to remove some artefacts.
Important: You must stop the validator process before performing the steps outlined below, failure to do that will likely cause database corruption.
The procedure takes ~5 minutes to complete and will not cause major service disruption.
Switch to root
sudo -s
Stop validator service
systemctl stop validator
Verify that validator is not running
systemctl status validator
Perform database cleanup
find /var/tos-work/db -name 'LOG.old*' -exec rm {} +
rm -r /var/tos-work/db/files/packages/temp.archive.*
Start validator service
systemctl start validator
Verify that the validator process is running by analysing the processes and log. Validator should re-sync with the network within a few minutes.
Backups
The easiest and most efficient way to backup the validator is to copy crucial node configuration files, keys and mytosctrl settings:
- Node configuration file:
/var/tos-work/db/config.json
- Node private keyring:
/var/tos-work/db/keyring
- Node public keys:
/var/tos-work/keys
- mytosctrl configuration and wallets:
$HOME/.local/share/mytos*
where $HOME is the home directory of the user who started the installation of mytosctrl OR/usr/local/bin/mytoscore
if you installed mytosctrl as root.
This set is everything you need to perform a recovery of your node from scratch.
Snapshots
Modern file systems such as ZFS offer snapshot functionality, most cloud providers also allow their customers to make snapshots of their machines during which the entire disk is preserved for future use.
The problem with both methods is that you must stop node before performing a snapshot, failure to do so will most likely result in a corrupt database with unexpected consequences. Many cloud providers also require you power down the machine before performing a snapshot.
Such stops should not be performed often, if you snapshot your node once a week then in the worst case scenario after recovery you will have a node with a week-old database and it will take your node more time to catch up with the network then to perform a new installation using mytosctrl "install from dump" feature (-d flag added during invocation of install.sh
script).
Disaster recovery
To perform recovery of your node on a new machine:
Install mytosctrl / node
For fastest node initialization add -d
switch to invocation of installation script.
Switch to root user
sudo -s
Stop mytoscore and validator processes
systemctl stop validator
systemctl stop mytoscore
Apply backed up node configuration files
- Node configuration file:
/var/tos-work/db/config.json
- Node private keyring:
/var/tos-work/db/keyring
- Node public keys:
/var/tos-work/keys
Set node IP address
If your new node has a different IP address then you must edit the node configuration file /var/tos-work/db/config.json
and set the leaf .addrs[0].ip
to decimal representation of new IP address. You can use this python script to convert your IP to decimal.
Ensure proper database permissions
chown -R validator:validator /var/tos-work/db
Apply backed up mytosctrl configuration files
Replace $HOME/.local/share/mytos*
where $HOME is home directory of user who started the installation of mytosctrl with backed-up content, make sure that the user is the owner of all files you copy.
Start mytoscore and validator processes
systemctl start validator
systemctl start mytoscore
Security
Host level security
Host level security is a huge topic which lies outside the scope of this document, we do however advise that you never install mytosctrl under root user, use a service account to ensure privilege separation.
Network level security
TOS Validators are high value assets that should be protected against external threats, one of the first steps you should take is to make your node as invisible as possible, this means locking down all network connections. On a validator node, only a UDP Port used for node operations should be exposed to the internet.
Tools
We will use a ufw firewall interface as well as a jq JSON command line processor.
Management Networks
As a node operator, you need to retain full control and access to the machine, in order to do this you need at least one fixed IP address or range.
We also advise you to setup a small "jumpstation" VPS with a fixed IP Address that can be used by you to access your locked down machine(s) if you do not have a fixed IP at your home/office or to add an alternative way to access secured machines should you lose your primary IP address.
Install ufw and jq1
sudo apt install -y ufw jq
Basic lockdown of ufw ruleset
sudo ufw default deny incoming; sudo ufw default allow outgoing
Disable automated ICMP echo request accept
sudo sed -i 's/-A ufw-before-input -p icmp --icmp-type echo-request -j ACCEPT/#-A ufw-before-input -p icmp --icmp-type echo-request -j ACCEPT/g' /etc/ufw/before.rules
Enable all access from management network(s)
sudo ufw insert 1 allow from <MANAGEMENT_NETWORK>
repeat the above command for each management network / address.
Expose node / validator UDP port to public
sudo ufw allow proto udp from any to any port `sudo jq -r '.addrs[0].port' /var/tos-work/db/config.json`
Doublecheck your management networks
Important: before enabling firewall, please double-check that you added the correct management addresses!Enable ufw firewall
sudo ufw enable
Checking status
To check the firewall status use the following command:
sudo ufw status numbered
Here is an example output of a locked-down node with two management networks / addresses:
Status: active
To Action From
-- ------ ----
[ 1] Anywhere ALLOW IN <MANAGEMENT_NETWORK_A>/28
[ 2] Anywhere ALLOW IN <MANAGEMENT_NETWORK_B>/32
[ 3] <NODE_PORT>/udp ALLOW IN Anywhere
[ 4] <NODE_PORT>/udp (v6) ALLOW IN Anywhere (v6)
Expose LiteServer port
sudo ufw allow proto tcp from any to any port `sudo jq -r '.liteservers[0].port' /var/tos-work/db/config.json`
Please note that the LiteServer port should not be exposed publicly on a validator.
More information on UFW
See this excellent ufw tutorial from Digital Ocean for more ufw magic.
IP Switch
If you believe that your node is under attack then you should consider switching IP Address. The way to achieve the switch depends on your hosting provider; you might pre-order a second address, clone your stopped VM into another instance or setup a new instance by performing a disaster recovery process.
In any case, please do make sure that you set your new IP Address in the node configuration file!