Yesterday, coming home from work I was greeted with a chorus - "The Internet stopped working!". Uh-oh. Bad news for any part-time administrator.
I have noticed on occassion that the OpenBSD machine can stop forwarding packets and the solution is to log into it, fire a packet across the WAN and then one into the LAN and it seems to pick-up from where it left off - it's strange but infrequent.
Not this time. Reboot the OpenBSD firewall and get nothing. Jump on to the machine, locally I can get web pages from the Internet, but not inside the network. Finally the penny drops when on my laptop I get a message saying that someone is "online on Skype".
Duh.
Something with the transparent web proxy. Sure enough, checking the logs I find:
Jul 21 18:11:46 cerberus squid[3627]: Write failure -- check your disk space and cache.log
Jul 21 18:11:46 cerberus squid[28891]: Squid Parent: child process 3627 exited due to signal 6
Jul 21 18:11:49 cerberus squid[28891]: Squid Parent: child process 18622 started
Jul 21 18:11:50 cerberus squid[18622]: Write failure -- check your disk space and cache.log
Jul 21 18:11:50 cerberus squid[28891]: Squid Parent: child process 18622 exited due to signal 6
Jul 21 18:11:53 cerberus squid[28891]: Squid Parent: child process 3621 started
Jul 21 18:11:54 cerberus squid[3621]: Write failure -- check your disk space and cache.log
Jul 21 18:11:54 cerberus squid[28891]: Squid Parent: child process 3621 exited due to signal 6
Jul 21 18:11:54 cerberus squid[28891]: Exiting due to repeated, frequent failures
So to all those that insist there's nothing wrong with one great big partition (more true if some fool does not over allocate space to his web cache server... but I digress!) the only thing that failed in this scenario was the web cache. Even the logging kept working (as most Unix'ers would know, logging to the /var directory) as it still had some space up it's sleeve as a root-user task.
Okay. That's pretty obvious. The easy answer is to reduce the space allocated for the cache, for example from 8192MB to 7000MB. The downside to this is then I'd have to nuke all of the cache to start again.
The much more insane solution is to say - hang it. I've paid for all of this data, so I'll expand the size of the partition. Now, being a smarty-pants, I've actually thought about this scenario before it happened. Most of the partitions live on disk 0 (/dev/sd0 - it's a SCSI disk), except for /var (where the cache files live) which is on disk 1.
Bewdy...
...nearly - unfortunately, I don't know how to expand partitions under OpenBSD (and the only references I could find at the time said that it wasn't possible) - but it's Vmware ESXi to the rescue.
So, a few clicks later, I've got a new 30GB virtual disk set up - run a SCSI bus rescan to pick it up and place it into the device pool.
disklabel -E sd2
To enter disklabel programme and edit the tables interactively
newfs /dev/sd2d
To format it and...
cd /mnt
sudo mkdir newvar
mount /dev/sd2d /mnt/newvar
cd newvar
to mount the new var into the file system and...
sudo dump -0 -a -u -f - /var | sudo restore -r -f -
to copy the old /var into the new /var.
Next we need to tell the file system that we're changing disks - so edit the /etc/fstab file to reflect the new location.
Finally, perform a reboot and it all worked great.
Reading the configuration options a little harder for squid, tells us that we should not allocate more than 80% of the disk space to the cache. Hmm... 10,000MB partition, 8,192MB cache. Oops. Now it's set up a little differently with a 20GB cache on a 30GB partition, which allows some more room to grow if neccessary.
Filesystem Size Used Avail Capacity Mounted on
/dev/sd0a 1008M 39.7M 918M 4% /
/dev/sd0i 707M 533M 139M 79% /home
/dev/sd0d 2.0G 2.0K 1.9G 0% /tmp
/dev/sd0e 3.9G 424M 3.3G 11% /usr
/dev/sd0f 3.9G 36.2M 3.7G 1% /usr/local
/dev/sd0h 2.0G 335M 1.5G 17% /usr/obj
/dev/sd0g 2.0G 647M 1.2G 34% /usr/src
/dev/sd2d 29.5G 9.4G 18.7G 33% /var
-bash-4.0$
No comments:
Post a Comment