Main

August 30, 2010

A Cloud Balancing Act

Over at F5's Dev Central, Lori MacVittie talks about load balancing and the cloud:

When you decide to deploy an application to the cloud you have a very similar decision: do you extend your existing dwelling, essentially integrating the environment with your own or do you maintain a separate building that is still "on the premises" but not connected in any way except that it's accessible via your shared driveway.

Using a garage expansion as a metaphor for scaling laterally to the cloud is a great one, and captures a lot of the nuances to be dealt with.

I'd like to add a third option to Lori's first two, based on our experience with hundreds of enterprises -- the valet strategy. Rather than simply load balancing between multiple systems, a cloud can sit in front of multiple sites, performing application delivery functions, as well as load balancing betwixt the backend sites.

Many Akamai customers do this today. They may have several data centers of their own, and want to integrate cloud-based services seamlessly into their existing sites (like taking advantage of user prioritization to load balance between an application server and a cloud-based waiting room; or using storage in the cloud to provide failover services). Akamai's cloud accepts the end user request, and either takes care of the user locally, or gathers the necessary resources from among multiple backend systems to service the user. And that's a way to load balance transparently to the user.

May 6, 2010

Contracting the Common Cloud

After attending CSO Perspectives, Bill Brenner has some observations on contract negotiations with SaaS vendors. While his panel demonstrated a breadth of customer experience, it was, unfortunately, lacking in a critical perspective: that of a cloud provider.

Much of the point of SaaS, or any cloud service, in the economy of scale you get; not just in capacity, but also in features. You're selecting from the same set of features that every other customer is selecting from, and that's what makes it affordable. And that same set of features needs to extend up into the business relationship. As the panel noted, relationships around breach management, data portability, and transport encryption are all important, but if you find yourself arguing for a provider to do something it isn't already, you're likely fighting a Sisyphean battle.

But how did a customer get to that point? Enterprises typically generate their own control frameworks, in theory beginning from a standard (like ISO 27002), but then redacting out the inapplicable (to them), tailoring controls, and adding in new controls to cover incidents they've had in the past. And when they encounter a SaaS provider who isn't talking about security controls, the natural tendency is to convert their own control framework into contractual language. Which leads to the observations of the panel participants: it's like pulling teeth.

A common request I've seen is the customer who wants to attach their own security policy - often a thirty to ninety page document - as a contract addendum, and require the vendor to comply with it, "as may be amended from time to time by Customer". And while communicating desires to your vendors is how they'll decide to improve, no cloud vendor is going to be able to satisfy that contract.

Instead, vendors need to be providing a high-water mark of business and technology capabilities to their customer base. To do this, they should start back from those original control frameworks, and not only apply them to their own infrastructure, but evaluate the vendor-customer interface as well. Once implemented, these controls can then be packaged, both into the baseline (for controls with little or no variable cost), and into for-fee packages. Customers may not want (or want to pay for) all of them, but the vendors need to be ahead of their customers on satisfying needs. Because one-off contract requirements don't scale. But good security practices do.

December 18, 2009

Modeling Imperfect Adversaries

An important piece of risk assessment is understanding your adversaries. Often, this can degenerate into an assumption of perfect adversaries. Yet when we think about risk, understanding that our adversaries have different capabilities is critical to formulating reasonable security frameworks. Nowhere is this more true than in the streaming media space.

Brian Sniffen (a colleague at Akamai) recently presented a paper at FAST exploring ways of considering different adversaries, especially in the context of different business models. He presents some interesting concepts worth exploring if you're in the space:

  • Defense in breadth: The concept of using different security techniques to protect distinct attack surfaces, specifically looking at defeating how a specific adversary type is prone to conduct attacks.
  • Tag-limited adversaries: An extension to the Dolev-Yao adversary (a perfect adversary who sits inline on a communication stream), the tag-limited adversary may only have knowledge, capabilities, or desire to conduct attacks within a limited vocabulary.

His paper is also a good primer on thinking about streaming threat models.

December 10, 2009

DDoS thoughts

We are used to measuring the efficiency of DDoS attacks in ratios of bits-per-second. An attacker wants to consume many bits-per-second for each of his own bits-per-second that he uses. He would rather send one packet to generate a storm than have to send the storm himself. We can extend this efficiency measurement to other attacks. Let's use the name "flits per second" (for fully-loaded bits) for this more general measurement of cost and pain: sending or receiving one bit-per-second costs one flit-per-second. Doing enough computation to have an opportunity cost of one bit-per-second has a cost of one flit-per-second. Enough disk access to forestall one bit-per-second costs one flit-per-second. Now we can talk about the flit capacity of the attacker and defender, and about the ratio between flits consumed on each side during an attack.

From a defensive efficiency perspective, we have two axes to play with: first, reducing the flit-to-bit ratio of an attack, by designing optimal ways of handling traffic; and second, increasing the relative client cost to produce an attack.

To reduce the flitcost, consider the history of SYN floods. SYN floods work by sending only the first packet in the three way TCP handshake; the victim computer keeps track of the half-open connection after sending its response, and waits for the attacker's followup. That doesn't come; for the cost of a single SYN packet, the attacker gets to consume a sparse resource (half-open connections) for some period of time. The total amount of traffic needed historically was pretty minimal, until SYN cookies came along. Now, instead of using the sparse resource, targets use a little bit of CPU to generate a cryptographic message, embed it in their response, and proceed apace. What was a very effective attack has become rather ineffective; against most systems, a SYN flood has a lower flit-to-bit ratio than more advanced application layer attacks.

The other axis is more interesting, and shows why SYN floods are still prevalent even today: they're cheap to produce. They don't consume a lot of cycles on the attacking systems, and don't require interesting logic or protocol awareness. The fewer resources an attacker can consume, the more likely their attack will go unnoticed by the owners of the compromised hosts used in the attack (Case in point: look at how fast Slammer remediation happened. Why? ISPs were knocked offline by having infected systems inside). Many attack programs effectively reduce to "while (1) {attack}". If the attack is making an HTTP request, filtering the request will often generate a higher rate of requests, without changing the attacker's costs. If this higher rate has the same effect on you that the lower rate did, you didn't buy anything in your remediation. You might have been better off responding more slowly, than not at all.

In the general case, this leads us to two solution sets. Traffic filtering is the set of technologies designed to make handling attacks more efficient; either by handling the attack further out in an infrastructure, classifying it as malicious for a cheaper cost than processing it, or making processing cheaper.

Capacity increases, on the other hand, are normally expensive, and they're a risky gamble. If you increase far in excess of attacks you ever see, you've wasted money. On the other hand, if you increase by not quite enough, you're still going to be impacted by an event (and now, of course, you'll be criticized for wasting money that didn't help). Obligatory vendor pitch: this is where a shared cloud infrastructure, like a CDN, comes into play. Infrastructures that measure normal usage in terabits per second have a significantly different tolerance for attack capacity planning than most normal users.