Posted: Mar 17, 2020 in Company News
As a lot of you experienced today there was a significant amount of downtime being able to access your server, control panel and FTP. We want to deeply apologize for this and explain exactly what happened, why it happened and how we will ensure it doesn’t happen into the future.
How did this happen ?
Today we planned to do a control panel update which was anticipated to be a 15 minute process as it had a core update that required us updating all nodes at the same time instead of a regional roll out. Typically when we do an update this is done per region rather then all at once. We wanted to try to make this as quickly and as smooth as possible to avoid any long periods of downtime. However, a few of our nodes ran on CentOS6 version of Linux which was not supported in this new update. This was not expected and impacted 7 nodes which were on this legacy operating system. These nodes specifically had to revert the panel versions and then reboot again in order for them to stay online. This took some additional configuration reworks to bring them back online and resulted in extended downtime for these 7 nodes
While this was happening a completely unrelated event happened at two of our data centers resulting in nodes being down at those data centers. We immediately contacted our data centers for a resolution but unfortunately it took time for them to resolve the issue and bring the network back online. For some of our users this caused additional downtime unrelated to the control panel update.
Staff were informed of the control panel update that was coming but a critical part of our workflow was missed and we did not properly notify our customers of the incoming downtime for the update. We apologize again for this as everyone should have been notified before the update to ensure they were prepared for the maintenance update.
Moving forward into the future.
We have learned a great deal from this and discovered what we did wrong to ensure something on this scale does not happen again. We are going to have a lot more checks in the background to ensure the conditions are perfect before issuing an updates including better communication with our data centers. In addition when these updates are going to be performed we will be sure to send out emails, notifications on all our social media accounts, Discord and a notification system in the control panel itself. We are greatly appreciative to all of you whom stuck with us during this and your amazing patience as we worked to resolve it. We are truly sorry for the downtime that all of you experienced and we will ensure you that we will continue to strive to stay the best Minecraft server hosting company in the world.