Alt account of [email protected] here.

Our instance is currently down and I can’t get remote access to the servers. It appears that there might have been a hardware failure of the main firewall, which is the one thing I can’t work around remotely.

I am still trying a few things, but I am not very optimistic that I can get access.

The really unfortunate part is that just now I am on one of my rare work deployments abroad, so I also can’t access it physically during the next few weeks and my usual back up that could restart it is not available either.

As something like that never happened in 3 years operating the servers, I thought I can risk it, but murphy’s law seems inescapable 😓

I will try to keep you posted here on any updates, but probably there will not be much I can do for a while. Really bad timing 😥

Edit: we might use this “opportunity” to migrate the instance to Piefed, which has been an idea for quite some time now. I will keep you posted on that.

  • oceanA
    link
    fedilink
    English
    arrow-up
    18
    ·
    1 day ago

    I this it’s a law that servers run 100% perfect until the literal day one leaves town with zero way to return home. One of the many reasons I got all my services off of unraid.

    Very cool to learn you’re running your own machines. Do you go into detail about this anywhere?

    • Kris@feddit.orgOP
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      15 hours ago

      We have a small write up about the hardware on our wiki, but it is also down right now.

      I think we will share a post-mortem write up of the actual improvements we will do to avoid this in the future.

      One thing I will definitly do is to add a KVM remote management console to one of our server boards and move the main firewall into a VM with hardware passthrough of the NICs (this was anyways planned for a 10gbit network upgrade for the second half of 2025). This way I should be able to reboot and even reinstall the main ingress point remotely, so that only the fiber gateway remains as a failure point that requires physical access.