Main

February 16, 2011

Malware hunting

Today at the RSA Conference, Akamai Principal Security Architect Brian Sniffen is giving a talk titled "Scanning the Ten Petabyte Cloud: Finding the malware that isn't there." In Brian's talk, he discusses the challenges of hunting for malware hooks in stored HTML pages of unspecified provenance, and some tips and tricks for looking for this malicious content.

In conjunction with his talk, Akamai is releasing the core source code for our vscan software. The source code is BSD3-licensed.

We are hopeful that our experiences can be helpful to others looking for malware in their HTML.

January 27, 2011

Tanstaafl

I was reading Rafal Los over at the HP Following the White Rabbit blog discussing whether anonymous web browsing is even possible:

Can making anonymous surfing still sustain the "free web" concept? - Much of the content you surf today is free, meaning, you don't pay to go to the site and access it. Many of these sites offer feature-rich experiences, and lots of content, information and require lots of work and upkeep. It's no secret that these sites rely on advertising revenue at least partly (which relies on tracking you) to survive ...if this model goes away what happens to these types of sites? Does the idea of free Internet content go away? What would that model evolve to?

This is a great point, although insufficiently generic, and limited to the gratis view of the web -- the allegedly free content. While you can consider the web browsing experience to be strictly transactional, it can more readily be contemplated as an instantiation of a world of relationships. For instance, you read a lot of news; but reading a single news article is just a transaction in the relationships between you and news providers.

A purchase from an online retailer? Let's consider a few relations:

The buyer's relationship with:
  • the merchant
  • their credit card / alternative payment system
  • the receiver
The merchant's relationship with:
  • the payment gateway
  • the manufacturer
  • the shipping company
The payment card industry's interrelationships: payment gateways, acquiring banks, card brands, and card issuers all have entangled relationships.

The web is a world filled with fraud, and fraud lives in the gaps between these relationships (Often, relationships are only used one way: buyer gives credit card to merchant, who gives it to their gateway, who passes it into the banking system. If the buyer simply notified their bank of every transaction, fraud would be hard; the absence of that notification is a gap in the transaction). The more a merchant understands about their customer, the lower their cost can be.

Of course, this model is harder to perceive in the gratis environment, but is nonetheless present. First, let's remember:

If you're not paying for something, you're not the customer; you're the product being sold.

Often, the product is simply your eyeballs; but your eyeballs might have more value the more the merchant knows about you. (Consider the low value of the eyeballs of your average fantasy football manager. If the merchant knows from past history that those eyeballs are also attached to a person in market for a new car, they can sell more valuable ad space.) And here, the more value the merchant can capture, the better services they can provide to you.

A real and fair concern is whether the systemic risk added by the merchant in aggregating information about end users is worth the added reward received by the merchant and the user. Consider the risk of a new startup in the gratis world of location-based services. This startup may create a large database of the locations of its users over time (consider the surveillance possibilities!), which, if breached, might expose the privacy and safety of those individuals. Yet because that cost is not borne by the startup, they may perceive it as a reasonable risk to take for even a small return.

Gratis services - and even for-pay services - are subsidized by the exploitable value of the data collected. Whether or not the business is fully monetizing that data, it's still a fair question to ask whether the businesses can thrive without that revenue source.

December 16, 2010

Architecting for DDoS Defense

DDoS is back in the news again, given the recent post-Cyber Monday DDoS attacks and the Anonymous DDoS attacks targeted at various parties. This seems like a good time to remember the concepts you need in the front of your mind when you're designing to resist DDoS.

DDoS mitigation isn't a point solution; it's much more about understanding how your architecture might fail, and how efficient DDoS attacks can be. Sometimes, simply throwing capacity at the problem is good enough (in many cases, our customers just start with this approach, using our WAA, DSA, and EDNS solutions to provide that instant scalability), but how do you plan for when simple capacity might not be sufficient?

It starts with assuming failure: at some point, your delicate origin infrastructure is going to be overwhelmed. Given that, how can you begin pushing out as much functionality as possible into the edge? Do you have a set of pages that ought to be static, but are currently rendered dynamically? Make them cacheable, or set up a backup cacheable version. Put that version of your site into scaleable cloud storage, so that it isn't relying on your infrastructure.

For even dynamic content, you'd be amazed at the power of a short-term caching. A 2 second cache is all but unnoticeable to your users, but can offload significant attack traffic to your edge. Even a zero-second cache can be interesting; this lets your front end cache results, and serve them (stale) if they can't get a response from your application.

After you think about disaster resilience, you should start planning for the future. How can you authenticate users early and prioritize the requests of your known good users? How much dynamic content assembly can you do without touching a database? Can you store-and-forward user generated content when you're under heavy load?

The important point is to forget much of what we've been taught about business continuity. The holy grail of "Recovery Time Objective" (how long you can be down) shouldn't be the target, since you don't want to go down. Instead, you need to design for you Minimum Uninterrupted Service Target - the basic capabilities and services you must always provide to your users and the public. It's harder to design for, but makes weathering DDoS attacks much more pleasant.

December 15, 2010

Welcome AWS

By now you've certainly seen that Amazon Web Services has achieved PCI Compliance as a Level 1 merchant. This is great; the folks over at Amazon deserve congratulations on joining the cabal of compliant clouds. You've probably also seen the PCI pundits pithily pronouncing, "Just because AWS is PCI compliant, it doesn't mean they make you compliant by using them!". What does this mean?

Put simply, AWS, like Akamai's Dynamic Site Accelerator, is a platform over which your e-commerce flows. It's critically important that that platform is compliant if you're moving cardholder data through it (just as it's also important that PCI compliance is called out in your contract with AWS, per DSS 12.2). But more importantly, the way in which your organization, and your applications, handle credit card data is the key piece of getting compliant. What AWS's certification buys you is the same thing Akamai's does - the ability to exclude those components from your PCI scope.

(if you want to get compliant, the easiest way, of course, is to tokenize cardholder data before it hits your environment, using a service like EdgeTokenization).

August 30, 2010

A Cloud Balancing Act

Over at F5's Dev Central, Lori MacVittie talks about load balancing and the cloud:

When you decide to deploy an application to the cloud you have a very similar decision: do you extend your existing dwelling, essentially integrating the environment with your own or do you maintain a separate building that is still "on the premises" but not connected in any way except that it's accessible via your shared driveway.

Using a garage expansion as a metaphor for scaling laterally to the cloud is a great one, and captures a lot of the nuances to be dealt with.

I'd like to add a third option to Lori's first two, based on our experience with hundreds of enterprises -- the valet strategy. Rather than simply load balancing between multiple systems, a cloud can sit in front of multiple sites, performing application delivery functions, as well as load balancing betwixt the backend sites.

Many Akamai customers do this today. They may have several data centers of their own, and want to integrate cloud-based services seamlessly into their existing sites (like taking advantage of user prioritization to load balance between an application server and a cloud-based waiting room; or using storage in the cloud to provide failover services). Akamai's cloud accepts the end user request, and either takes care of the user locally, or gathers the necessary resources from among multiple backend systems to service the user. And that's a way to load balance transparently to the user.

May 6, 2010

Contracting the Common Cloud

After attending CSO Perspectives, Bill Brenner has some observations on contract negotiations with SaaS vendors. While his panel demonstrated a breadth of customer experience, it was, unfortunately, lacking in a critical perspective: that of a cloud provider.

Much of the point of SaaS, or any cloud service, in the economy of scale you get; not just in capacity, but also in features. You're selecting from the same set of features that every other customer is selecting from, and that's what makes it affordable. And that same set of features needs to extend up into the business relationship. As the panel noted, relationships around breach management, data portability, and transport encryption are all important, but if you find yourself arguing for a provider to do something it isn't already, you're likely fighting a Sisyphean battle.

But how did a customer get to that point? Enterprises typically generate their own control frameworks, in theory beginning from a standard (like ISO 27002), but then redacting out the inapplicable (to them), tailoring controls, and adding in new controls to cover incidents they've had in the past. And when they encounter a SaaS provider who isn't talking about security controls, the natural tendency is to convert their own control framework into contractual language. Which leads to the observations of the panel participants: it's like pulling teeth.

A common request I've seen is the customer who wants to attach their own security policy - often a thirty to ninety page document - as a contract addendum, and require the vendor to comply with it, "as may be amended from time to time by Customer". And while communicating desires to your vendors is how they'll decide to improve, no cloud vendor is going to be able to satisfy that contract.

Instead, vendors need to be providing a high-water mark of business and technology capabilities to their customer base. To do this, they should start back from those original control frameworks, and not only apply them to their own infrastructure, but evaluate the vendor-customer interface as well. Once implemented, these controls can then be packaged, both into the baseline (for controls with little or no variable cost), and into for-fee packages. Customers may not want (or want to pay for) all of them, but the vendors need to be ahead of their customers on satisfying needs. Because one-off contract requirements don't scale. But good security practices do.

December 18, 2009

Modeling Imperfect Adversaries

An important piece of risk assessment is understanding your adversaries. Often, this can degenerate into an assumption of perfect adversaries. Yet when we think about risk, understanding that our adversaries have different capabilities is critical to formulating reasonable security frameworks. Nowhere is this more true than in the streaming media space.

Brian Sniffen (a colleague at Akamai) recently presented a paper at FAST exploring ways of considering different adversaries, especially in the context of different business models. He presents some interesting concepts worth exploring if you're in the space:

  • Defense in breadth: The concept of using different security techniques to protect distinct attack surfaces, specifically looking at defeating how a specific adversary type is prone to conduct attacks.
  • Tag-limited adversaries: An extension to the Dolev-Yao adversary (a perfect adversary who sits inline on a communication stream), the tag-limited adversary may only have knowledge, capabilities, or desire to conduct attacks within a limited vocabulary.

His paper is also a good primer on thinking about streaming threat models.

December 10, 2009

DDoS thoughts

We are used to measuring the efficiency of DDoS attacks in ratios of bits-per-second. An attacker wants to consume many bits-per-second for each of his own bits-per-second that he uses. He would rather send one packet to generate a storm than have to send the storm himself. We can extend this efficiency measurement to other attacks. Let's use the name "flits per second" (for fully-loaded bits) for this more general measurement of cost and pain: sending or receiving one bit-per-second costs one flit-per-second. Doing enough computation to have an opportunity cost of one bit-per-second has a cost of one flit-per-second. Enough disk access to forestall one bit-per-second costs one flit-per-second. Now we can talk about the flit capacity of the attacker and defender, and about the ratio between flits consumed on each side during an attack.

From a defensive efficiency perspective, we have two axes to play with: first, reducing the flit-to-bit ratio of an attack, by designing optimal ways of handling traffic; and second, increasing the relative client cost to produce an attack.

To reduce the flitcost, consider the history of SYN floods. SYN floods work by sending only the first packet in the three way TCP handshake; the victim computer keeps track of the half-open connection after sending its response, and waits for the attacker's followup. That doesn't come; for the cost of a single SYN packet, the attacker gets to consume a sparse resource (half-open connections) for some period of time. The total amount of traffic needed historically was pretty minimal, until SYN cookies came along. Now, instead of using the sparse resource, targets use a little bit of CPU to generate a cryptographic message, embed it in their response, and proceed apace. What was a very effective attack has become rather ineffective; against most systems, a SYN flood has a lower flit-to-bit ratio than more advanced application layer attacks.

The other axis is more interesting, and shows why SYN floods are still prevalent even today: they're cheap to produce. They don't consume a lot of cycles on the attacking systems, and don't require interesting logic or protocol awareness. The fewer resources an attacker can consume, the more likely their attack will go unnoticed by the owners of the compromised hosts used in the attack (Case in point: look at how fast Slammer remediation happened. Why? ISPs were knocked offline by having infected systems inside). Many attack programs effectively reduce to "while (1) {attack}". If the attack is making an HTTP request, filtering the request will often generate a higher rate of requests, without changing the attacker's costs. If this higher rate has the same effect on you that the lower rate did, you didn't buy anything in your remediation. You might have been better off responding more slowly, than not at all.

In the general case, this leads us to two solution sets. Traffic filtering is the set of technologies designed to make handling attacks more efficient; either by handling the attack further out in an infrastructure, classifying it as malicious for a cheaper cost than processing it, or making processing cheaper.

Capacity increases, on the other hand, are normally expensive, and they're a risky gamble. If you increase far in excess of attacks you ever see, you've wasted money. On the other hand, if you increase by not quite enough, you're still going to be impacted by an event (and now, of course, you'll be criticized for wasting money that didn't help). Obligatory vendor pitch: this is where a shared cloud infrastructure, like a CDN, comes into play. Infrastructures that measure normal usage in terabits per second have a significantly different tolerance for attack capacity planning than most normal users.