Alt account of [email protected] here.
Our instance is currently down and I can’t get remote access to the servers. It appears that there might have been a hardware failure of the main firewall, which is the one thing I can’t work around remotely.
I am still trying a few things, but I am not very optimistic that I can get access.
The really unfortunate part is that just now I am on one of my rare work deployments abroad, so I also can’t access it physically during the next few weeks and my usual back up that could restart it is not available either.
As something like that never happened in 3 years operating the servers, I thought I can risk it, but murphy’s law seems inescapable 😓
I will try to keep you posted here on any updates, but probably there will not be much I can do for a while. Really bad timing 😥
Edit: we might use this “opportunity” to migrate the instance to Piefed, which has been an idea for quite some time now. I will keep you posted on that.
I this it’s a law that servers run 100% perfect until the literal day one leaves town with zero way to return home. One of the many reasons I got all my services off of unraid.
Very cool to learn you’re running your own machines. Do you go into detail about this anywhere?
We have a small write up about the hardware on our wiki, but it is also down right now.
I think we will share a post-mortem write up of the actual improvements we will do to avoid this in the future.
One thing I will definitly do is to add a KVM remote management console to one of our server boards and move the main firewall into a VM with hardware passthrough of the NICs (this was anyways planned for a 10gbit network upgrade for the second half of 2025). This way I should be able to reboot and even reinstall the main ingress point remotely, so that only the fiber gateway remains as a failure point that requires physical access.