A little bruised, but we're still here.

04 Sep 2018 12:02 by Alexander John (CML)
Length: 2 minutes, 1 second (404 words)

We haven’t disappeared, we’re still here. The computer gods have just not being favouring us for the last couple of days.

One of our web servers, NENE, has been offline for nearly 48 hours due to events that are beyond our control. As this issue is not likely to be fixed before Friday (at the earliest), we have temporarily moved all affected websites including our own to a different server.

Our standard mail service have not been affected and are working as normal. We intentionally split our Web hosting and mail services exactly for this kind of situation.

So, what happened?

Over the past few weeks, our NENE web server has been misbehaving for no apparent reason. This behaviour was not apparent to our customers or visitors, but noticeable to us via our daily checking and monitoring.

As any investigation could cause service interruption, we held off until last weekend. Unfortunately, despite many hours of work we couldn’t trace the problem and we suspected that Windows itself may have been corrupted beyond easy repair.

As NENE is actually a virtual server, the quickest and easiest solution was to replace it with a brand new virtual machine (we had backed up all data prior to taking the server offline). With virtual servers, the process of installing and configuration is typically quicker than with physical hardware so we aimed only to be offline for a couple of hours.

So we provisioned a new virtual machine, a process that typically takes a few minutes. It didn’t. It took over six hours for our service provider to provision a new server. Once available, the new virtual server was far from healthy, with one of its’ more apparent problems being its’ inability to restart without hanging/crashing.

We’re currently sitting in a very frustrating limbo as our service provider attempts to find a fix. They’ve managed to independently replicate the problem but have yet to discover a cure.

Update 17:00, 4th September

The latest update we have from our service provider is that they have managed to replicate the problem in their test environment and are currently investigating the underlying cause.

Update 15:30, 5th September

Our service provider has traced the issue. However, as the fix requires some changes to their infrastructure that are unlikely to be completed before Friday (7th September).

Copyright © 2011 - 2024 Calzada Media Limited. All Rights Reserved