DDoS incident report August 18th, 2011

Summary

On August 18, 2011 our network was target of a distributed denial of service attack from a large number of hosts in Pakistan and India. The attack started around 18:30 UTC while monitoring coped with degraded performance between 19:00 and 20:20 UTC. After intentionally bringing down our portal in order to raise the check frequency to normal levels things went back to normal and messages queued up for delivery were sent out via the remote gateways.

With help of our hosting provider RackSpace, our team was able to mitigate the attack using blacklists and identify the IP’s being targeted, allowing us to bring back the portal pages. As of writing the attack is still ongoing and showing a 3 to 6-fold increase in our usual traffic pattern. We are continuing to take proactive measures in order to react to possible changes in the situation.

What we have learned so far

DDoS attacks are difficult to control in general, but we’ve learned a lot from these events. The biggest issue was that our fail-over location was not able to activate itself as the core services were still running. We will be investigating how we can improve this situation without causing unnecessary duplicate probes or alarms to be sent out.

Secondarily, we learned that our main portal services are located too close to the core monitoring services in our network, and as such one may affect the other. We’re planning to physically separate these services now, so that we do not have to bring down our portal in the future in order to free bandwidth for the monitoring services.

That said, I want to give a huge thanks to the stand-by team (Kalina, Dimi and Stratos) who greatly helped reducing the impact of the attack so far by working as a team on several different tracks in parallel. I also want to thank RackSpace for the support from their knowledgeable and fanatical support team.

 

Timeline

  • 18:34 UTC Response team was first alerted about reduced connectivity to our systems (30-60% packet loss).
  • 18:46 UTC Contacted RackSpace support.
  • 18:59 UTC RackSpace identified the issue as a DDoS attack from the Pakistan/India region, they added an initial set of /16′s to our blacklist in an attempt to mitigate the attack.
  • 19:20 UTC Continuously adding /24 subnets to our blacklist.
  • 20:01 UTC Discussed placement of an additional protection layer with RackSpace to fence off the attack. But these measures would take would take up to 3 hours to set up.
  • 20:20 UTC Intentionally brought down the portal website to free up resources for core monitoring services.
  • 21:03 UTC Identified the target IP addresses and brought those down.
  • 21:10 UTC Rerouted all services on the identified IP’s elsewhere.
  • 21:10 UTC Verified pending alerts from the last 30 minutes were now being sent out correctly.
  • 21:30 UTC Brought back the web services excluding the targeted IP’s.
  • 22:56 UTC Brought back affected Jabber services and verified XMPP alerts being sent out.
  • 09:15 UTC Fixed a redirect problem on the watchmouse.com domain.

Thanks for your understanding, we will update this post as noteworthy events arrive.

 
Pieter Ennes
Senior Director of Engineering Artificial Monitoring
Nimsoft / CA Technologies (formerly WatchMouse)

Submit Comment