DDoS Thoughts – Leadership In Security

We are used to measuring the efficiency of DDoS attacks in ratios of bits-per-second. An attacker wants to consume many bits-per-second for each of his own bits-per-second that he uses. He would rather send one packet to generate a storm than have to send the storm himself. We can extend this efficiency measurement to other attacks. Let’s use the name “flits per second” (for fully-loaded bits) for this more general measurement of cost and pain: sending or receiving one bit-per-second costs one flit-per-second. Doing enough computation to have an opportunity cost of one bit-per-second has a cost of one flit-per-second. Enough disk access to forestall one bit-per-second costs one flit-per-second. Now we can talk about the flit capacity of the attacker and defender, and about the ratio between flits consumed on each side during an attack.

From a defensive efficiency perspective, we have two axes to play with: first, reducing the flit-to-bit ratio of an attack, by designing optimal ways of handling traffic; and second, increasing the relative client cost to produce an attack.

To reduce the flitcost, consider the history of SYN floods. SYN floods work by sending only the first packet in the three way TCP handshake; the victim computer keeps track of the half-open connection after sending its response, and waits for the attacker’s followup. That doesn’t come; for the cost of a single SYN packet, the attacker gets to consume a sparse resource (half-open connections) for some period of time. The total amount of traffic needed historically was pretty minimal, until SYN cookies came along. Now, instead of using the sparse resource, targets use a little bit of CPU to generate a cryptographic message, embed it in their response, and proceed apace. What was a very effective attack has become rather ineffective; against most systems, a SYN flood has a lower flit-to-bit ratio than more advanced application layer attacks.

The other axis is more interesting, and shows why SYN floods are still prevalent even today: they’re cheap to produce. They don’t consume a lot of cycles on the attacking systems, and don’t require interesting logic or protocol awareness. The fewer resources an attacker can consume, the more likely their attack will go unnoticed by the owners of the compromised hosts used in the attack (Case in point: look at how fast Slammer remediation happened. Why? ISPs were knocked offline by having infected systems inside). Many attack programs effectively reduce to “while (1) {attack}“. If the attack is making an HTTP request, filtering the request will often generate a higher rate of requests, without changing the attacker’s costs. If this higher rate has the same effect on you that the lower rate did, you didn’t buy anything in your remediation. You might have been better off responding more slowly, than not at all.

In the general case, this leads us to two solution sets. Traffic filtering is the set of technologies designed to make handling attacks more efficient; either by handling the attack further out in an infrastructure, classifying it as malicious for a cheaper cost than processing it, or making processing cheaper.

Capacity increases, on the other hand, are normally expensive, and they’re a risky gamble. If you increase far in excess of attacks you ever see, you’ve wasted money. On the other hand, if you increase by not quite enough, you’re still going to be impacted by an event (and now, of course, you’ll be criticized for wasting money that didn’t help). Obligatory vendor pitch: this is where a shared cloud infrastructure, like a CDN, comes into play. Infrastructures that measure normal usage in terabits per second have a significantly different tolerance for attack capacity planning than most normal users.