Hi Nat,
Kentico 9's web farm support works by determining if a server is healthy or not and adding / removing it from the synchronization process. This is determined by an async process that "pings" each server to make sure the other servers can access it. If the server is not reachable for over 3 minutes, it is moved to a "Not Responding" sate.
You can find out more about how the Web Farm process works in this webinar.
From what you provided, it looks like there may be an issue wit the server communicating with each other. To verify this, I would look into the internal network to make sure there is no latency or issues when pinging one server to another.
If you find there are no issues, I would recommend opening a support ticket as they may need to look at you system further to understand what may be happening.