Service Disruption - 28th June 19:20

Customers are reporting that they are unable to connect to DXI services. We have engineers investigating the problem.

19:50 - We are continuing to investigate the cause of the problems. We apologise for the problems this is causing. 

20:14 - We are continuing to investigate the cause of the problems. We apologise for the problems this is causing. 

20:46 - Engineers have discovered the cause of the problem, and are working to resolve the issue.

21:11 - We continue working to resolve this ongoing issue.

21:44 - The issue has now been resolved and the service is now available. We apologise once again for the problems.

 

Detailed Fault History :-
 
 
Sunday 28th June 2015 18:09:44   "Too Many Connections" error message found in logs.  
 
19:25:49 Engineers diagnosed the cause to be packet loss in fibre interface card, and flipped to the secondary backup interface.
The backup interface would not come 'up' and so the primary was re-enabled and at that point it seemed OK.
A decision was made not to elevate a slave as two customers were complaining they needed to get back to work immediately.
Approx 9pm the primary interface again exhibited the packet loss and engineers attended the datacentre to be on hand for a server restart to be attempted, a server restart being a much faster process than a slave elevation.
A network interface restart was attempted first, and this cleared the problem so a server restart was not needed. 
After 1 hour there being no further instances of the issue the server was left to operate.
 
Monday 29th
 
At 06:30:14 the server suffered packet loss and the oncall engineer restarted the interface multiple times until the issue cleared at 07:13:32
An engineer was then dedicated to reset the interface should a re-occurrence occur as soon as packet loss was seen. 
The issue occurred again at 10:59:43 and the network interface was restarted to clear the issue at 11:00:15
A decision to run copper cables to the spare ethernet interfaces on the machine was made, with a view to flip over to using copper if the fibre card again had problems
Engineers ran cabling between data rooms (optical only in main room) and a design review for required network configuration changes was undertaken.
At 15:40:41 the optical interface started experiencing problems again, and the interface was restarted and the copper interface was brought up as primary at 15:42:14
 
No further instances of packet loss have been experienced :-
--- ping statistics ---
51665 packets transmitted, 51665 received, 0% packet loss, time 51666637ms
rtt min/avg/max/mdev = 0.279/0.418/5.716/0.097 ms
 
Next Actions :-
 
The server is currently using an optical interface as its backup, which is known faulty.  
A short maintenance window of 1 minute is required to force restart the interfaces to bring up a new copper interface in place of the optical as backup.  
This will be undertaken today Tuesday 30th at 9pm. Once completed, and no further issues found, the ticket will be closed.
Have more questions? Submit a request

Comments

Powered by Zendesk