Update on server issues

There have been two issues which have been causing FAH some series issues. I wanted to give an update.


First, networking on subnet 171.64.122.XX seems to be very slow and easily overloaded. Our diagnostics (and donor input) points to this issue being caused by a firewall on this subnet that can't handle the load. I have asked 3 different branches of our IT dept looking into this, but nobody has any fixes just yet. Getting impatient with this slow response, some time ago I requested for us to get a new network for our machines. This request is being processed. Once it's accepted, it will take a little time for them to get the new net in there (they may be able to do a VLAN, but more likely they will have to run a new physical cable, since the VLAN would still be behind the firewall). This subnet also has some of our collection servers, so this will be a big help there too once this is resolved.

Second, the current server code can get overloaded. When it does, it slows down. However, new code in the server notices this and restarts the server binary. This leads to downtime of about an hour when this happens. While, this does autofix the problem with just a little downtime, I'd like zero downtime (as would most people). I have paid a professional software house to rewrite our FAH server backend from scratch. That is almost done (it's in QA right now, with some somewhat minor issues to address). This new server code should address this issue (and other issues) with the server code, but may introduce new issues that need to be smoothed over. However, the rewrite is MUCH cleaner architected and so that will be important going forward in the future.

I just wanted to give a bump to let people know where we are. These issues aren't quick to resolve, but we are making progress. The new server code in particular will be a big help in the next 5 years of FAH, due to its rearchitecting and much cleaner code.