Architecting for DDoS Defense

DDoS is back in the news again, given the recent post-Cyber Monday DDoS attacks and the Anonymous DDoS attacks targeted at various parties. This seems like a good time to remember the concepts you need in the front of your mind when you're designing to resist DDoS.

DDoS mitigation isn't a point solution; it's much more about understanding how your architecture might fail, and how efficient DDoS attacks can be. Sometimes, simply throwing capacity at the problem is good enough (in many cases, our customers just start with this approach, using our WAA, DSA, and EDNS solutions to provide that instant scalability), but how do you plan for when simple capacity might not be sufficient?

It starts with assuming failure: at some point, your delicate origin infrastructure is going to be overwhelmed. Given that, how can you begin pushing out as much functionality as possible into the edge? Do you have a set of pages that ought to be static, but are currently rendered dynamically? Make them cacheable, or set up a backup cacheable version. Put that version of your site into scaleable cloud storage, so that it isn't relying on your infrastructure.

For even dynamic content, you'd be amazed at the power of a short-term caching. A 2 second cache is all but unnoticeable to your users, but can offload significant attack traffic to your edge. Even a zero-second cache can be interesting; this lets your front end cache results, and serve them (stale) if they can't get a response from your application.

After you think about disaster resilience, you should start planning for the future. How can you authenticate users early and prioritize the requests of your known good users? How much dynamic content assembly can you do without touching a database? Can you store-and-forward user generated content when you're under heavy load?

The important point is to forget much of what we've been taught about business continuity. The holy grail of "Recovery Time Objective" (how long you can be down) shouldn't be the target, since you don't want to go down. Instead, you need to design for you Minimum Uninterrupted Service Target - the basic capabilities and services you must always provide to your users and the public. It's harder to design for, but makes weathering DDoS attacks much more pleasant.

Welcome AWS

By now you've certainly seen that Amazon Web Services has achieved PCI Compliance as a Level 1 merchant. This is great; the folks over at Amazon deserve congratulations on joining the cabal of compliant clouds. You've probably also seen the PCI pundits pithily pronouncing, "Just because AWS is PCI compliant, it doesn't mean they make you compliant by using them!". What does this mean?

Put simply, AWS, like Akamai's Dynamic Site Accelerator, is a platform over which your e-commerce flows. It's critically important that that platform is compliant if you're moving cardholder data through it (just as it's also important that PCI compliance is called out in your contract with AWS, per DSS 12.2). But more importantly, the way in which your organization, and your applications, handle credit card data is the key piece of getting compliant. What AWS's certification buys you is the same thing Akamai's does - the ability to exclude those components from your PCI scope.

(if you want to get compliant, the easiest way, of course, is to tokenize cardholder data before it hits your environment, using a service like EdgeTokenization).

A Cloud Balancing Act

Over at F5's Dev Central, Lori MacVittie talks about load balancing and the cloud:
When you decide to deploy an application to the cloud you have a very similar decision: do you extend your existing dwelling, essentially integrating the environment with your own or do you maintain a separate building that is still "on the premises" but not connected in any way except that it's accessible via your shared driveway.

Using a garage expansion as a metaphor for scaling laterally to the cloud is a great one, and captures a lot of the nuances to be dealt with.

I'd like to add a third option to Lori's first two, based on our experience with hundreds of enterprises -- the valet strategy. Rather than simply load balancing between multiple systems, a cloud can sit in front of multiple sites, performing application delivery functions, as well as load balancing betwixt the backend sites.

Many Akamai customers do this today. They may have several data centers of their own, and want to integrate cloud-based services seamlessly into their existing sites (like taking advantage of user prioritization to load balance between an application server and a cloud-based waiting room; or using storage in the cloud to provide failover services). Akamai's cloud accepts the end user request, and either takes care of the user locally, or gathers the necessary resources from among multiple backend systems to service the user. And that's a way to load balance transparently to the user.

Awareness Training

Implementing a good security awareness program is not hard, if your company cares about security. If they don't, well, you've got a big problem.

It doesn't start with the auditable security program that most standards would have you set up. Quoting PCI-DSS testing procedures:

12.6.1.a Verify that the security awareness program provides multiple methods of communicating awareness and educating employees (for example, posters, letters, memos, web based training, meetings, and promotions).

12.6.1.b Verify that employees attend awareness training upon hire and at least annually.

12.6.2 Verify that the security awareness program requires employees to acknowledge (for example, in writing or electronically) at least annually that they have read and understand the company's information security policy.

For many awareness programs, this is their beginning and end. An annual opportunity to force everyone in the business to listen to us pontificate on the importance of information security, and make them read the same slides we've shown them every year. Or, if you've needed to gain cost efficiencies, you've bought a CBT program that is lightly tailored for your business (and as a side benefit, your employees can have races to see how quickly they can click through the program).

But at least it's auditor-friendly: you have a record that everyone attended, and you can make them acknowledge receipt of the policy that they are about to throw in the recycle bin. And you have to have an auditor friendly program, but it shouldn't be all that you do.

I can tell you that, for our baseline, auditor-friendly security awareness program, over 98% of our employee base have reviewed and certified the requisite courseware in the last year; and that of the people who haven't, the vast majority have either started work in the last two weeks (and thus are in a grace period), or are on an extended leave. It's an automated system, which takes them to a single page. At the bottom of the page is the button they need to click to satisfy the annual requirement. No gimmicks, no trapping the user in a maze of clicky links. But on that page is a lot of information: why security is important to us; what additional training is available; links to our security policy (2 pages) and our security program (nearly 80 pages); and an explanation of the annual requirement. And we find that a large majority of our users take the time to read the supplemental training material.

But much more importantly, we weave security awareness into a lot of activities. Listen to our quarterly investor calls, and you'll hear our executives mention the importance of security. Employees go to our all-hands meetings, and hear those same executives talk about security. The four adjectives we've often used to describe the company are "fast, reliable, scalable, and secure". Social engineering attempts get broadcast to a mailing list (very entertaining reading for everyone answering a published telephone number). And that doesn't count all of the organizations that interact with security as part of their routine.

And that's really what security awareness is about: are your employees thinking about security when it's actually relevant? If they are, you've succeeded. If they aren't, no amount of self-enclosed "awareness training" is going to fix it. Except, of course, to let you check the box for your auditors.

Edge Tokenization

Visa released its Credit Card Tokenization Best Practices last week, giving implementors a minimum guide on how to implement tokenization. It's a good read, although if you're planning on building your own tokenizer, I'd strongly recommend reading Adrian Lane's take on the subject, including practices above and beyond Visa's for building good tokenization systems.

But I don't recommend building your own tokenizer, unless you're a payment gateway (but if you're going to, please read Adrian's guidance, and design carefully). The big goal of tokenization is to get merchants' commerce systems out of scope for PCI. And if you want to try to remove your systems from PCI scope, you should never see the credit card number.

That's why I'm really excited about Akamai's Edge Tokenization service. As discussed at, we've been beta testing a service that captures credit card data in a customer website, hands it to our partner gateways, and substitutes the returned token to our customer's systems.

Image of Akamai Edge Tokenization service.  Consumer credit card is entered into a form on a merchant website.  Akamai server captures the credit card, and sends it to a payment gateway for processing.   The payment gateway returns a token to Akamai, and Akamai delivers the token in the POST body to the merchant.   The merchant never sees the credit card.

We don't do the tokenization ourselves, so that we never have the ability to reverse the tokens. But the capture and replacement all happens inside our Level 1 merchant environment, so our customers get to simply reduce the number of their systems that see credit cards (potentially removing them from scope).

Update, 2015: The EdgeTokenization service has reached end-of-sale.

NSEC3: Is the glass half full or half empty?

NSEC3, or the "Hashed Authenticated Denial of Existence", is a DNSSEC specification to authenticate the NXDOMAIN response in DNS. To understand how we came to create it, and the secrecy issues around it, we have to understand why it was designed. As the industry moves to a rollout of DNSSEC, understanding the security goals of our various Designed Users helps us understand how we might improve on the security in the protocol through our own implementations.

About the Domain Name Service (DNS)

DNS is the protocol which converts mostly readable hostnames, like, into IP addresses (like At its heart, a client (your desktop) is asking a server to provide that conversion. There are a lot of possible positive answers, which hopefully result in your computer finding its destination. But there are also some negative answers. The interesting answer here is the NXDOMAIN response, which tells your client that the hostnames does not exist.

Secrecy in DNS

DNS requests and replies, by design, have no confidentiality: anyone can see any request and response. Further, there is no client authentication: if an answer is available to one client, it is available to all clients. The contents of a zone file (the list of host names in a domain) are rarely publicized, but a DNS server acts as a public oracle for the zone file; anyone can make continuous requests for hostnames until they reverse engineer the contents of the zone file. With one caveat: the attacker will never know that they are done, as there might exist hostname that they have not yet tried.

But that hasn't kept people from putting information that has some form of borderline secrecy into a zone file. Naming conventions in zone files might permit someone to easily map an intranet just looking at the hostnames. Host names might contain names of individuals. So there is a desire to at least keep the zone files from being trivially readable.

DNSSEC and authenticated denials

DNSSEC adds in one bit of security: the response from the server to the client is signed. Since a zone file is (usually) finite, this signing can take place offline: you sign the contents of the zone file whenever you modify them, and then hand out static results. Negative answers are harder: you can't presign them all, and signing is expensive enough that letting an adversary make you do arbitrary signings can lead to DoS attacks. And you have to authenticate denials, or an adversary could poison lookups with long-lived denials.

Along came NSEC. NSEC permitted a denial response to cover an entire range (e.g., there are no hosts between and Unfortunately, this made it trivial to gather the contents of a zone: after you get one range, simply ask for the next alphabetical host ( and learn what the next actual host is ( From a pre-computation standpoint, NSEC was great - there are the same number of NSEC signed responses in a zone as all other signatures - but from a secrecy standpoint, NSEC destroyed what little obscurity existed in DNS.


NSEC3 is the update to NSEC. Instead of providing a range in which there are no hostnames, a DNS server publishes a hashing function, and a signed range in which there are no valid hashes.. This prevents an adversary from easily collecting the contents of the zone (as with NSEC), but does allow them to gather the size of the zone file (by making queries to find all of the unused hash ranges), and then conduct offline guessing at the contents of the zone files (as Dan Bernstein has been doing for a while). Enabling offline guessing makes a significant difference: with traditional DNS, an adversary must send an arbitrarily large number of queries (guesses) to a name server (making them possibly detectable); with NSEC, they must send as many queries as there are records; and with NSEC3, they must also send the same number of requests as there are records (with some computation to make the right guesses), and then can conduct all of their guessing offline.

While NSEC3 is an improvement from NSEC, it still represents a small step down in zone file secrecy. This step is necessary from a defensive perspective, but it makes one wonder if this is the best solution: why do we still have the concept of semi-secret public DNS names? If we have a zone file we want to keep secret, we should authenticate requests before answering. But until then, at least we can make it harder for an adversary to determine the contents of a public zone.

"Best" practices in zone secrecy

If you have a zone whose contents you want to keep obscure anyway, you should consider:
  • Limiting access to the zone, likely by IP address.
  • Use randomly generated record names, to make offline attacks such as Dan Bernstein's more difficult.
  • Fill your zone with spurious answers, to send adversaries on wild goose chases.
  • Instrument your IDS system to detect people trying to walk your zone file, and give them a different answer set than you give to legitimate users.

Jason Bau and John Mitchell, both of Stanford, have an even deeper dive into DNSSEC and NSEC3.

Credit Card Tokenization

Every merchant that takes credit cards undergoes an annual ritual known as their PCI audit. Why? Because merchants take credit cards from end users, and the Payment Card Industry Data Security Standard (PCI-DSS) requires the annual audit of all environments that store cardholder data. Unfortunately, PCI audits are complicated, expensive, and distracting. But new technology may make them less necessary.

Consider what the credit card really is: It's a promise of payment; an agreement between an end user and their bank (and, by proxy, the entire card network) that enables the user to promise funds to a merchant. That's also the weakness of the system: the 16 digits of the card (plus a few other data points, like expiration date, CVV, and name) are equivalent to a public currency.

And that currency needs to be stored in a lot of places, for a long period of time. For user convenience and increased revenues, merchants want to support repeat buying without card reentry, so they need to keep the card (and to support chargebacks). For fraud detection, merchants want to check for card overuse. This leads to a threat to the currency, and much of the purpose behind the (PCI-DSS): cards at rest are a valuable, and omnipresent target.


Tokenization is a technology that replaces the credit card (a promise between a user and their bank) with a token (a promise between a merchant and their bank). With tokenization, when presented a credit card, a merchant immediately turns it over to their payment gateway, and receives a token in exchange. The token is only useful to that merchant when talking to their gateway; the public currency (credit card usable anywhere) has been converted to private scrip (token tied to merchant and gateway).

The gateway has to maintain a mapping between tokens and credit cards, so that when a merchant redeems a token, the gateway can convert the token to a credit card charge. But a merchant can use the token in their internal systems for anything they would have used a credit card for in the past: loyalty programs, fraud detection, chargebacks, or convenience.

Tokenization isn't a panacea for a merchant. The merchant still has a point in their infrastructure that takes credit cards, and that point is subject to PCI-DSS. If the merchant has a brick and mortar operation, PCI-DSS requirements will still apply there (although some POS terminals do support tokenization). But it can take a merchant a long way to defending their customers' credit cards, and represents several steps forward to a model where merchants don't need to see a credit card at all.

Skynet or The Calculor?

Technology-based companies can either be organized around silicon (computers), or carbon (people). Depending on which type of company you are, security practices are, by necessity, different. So why then do we delude ourselves into thinking that there is one perfect set of security practices?

In a silicon-based organization, business processes are predominantly driven by predefined rules, implemented in computing systems (think: call centers). Where humans serve a role, it is often as an advanced interactive voice recognition (IVR) system, or as an expert system to handle cases not yet planned for (and, potentially, to make your customers happier to talk to a human). Security systems in this type of a world can be designed to think adversarially about the employees -- after all, the employees are to be constrained to augment the computing systems which drive the bulk of the business.

In a carbon-based organization, business processes are organized around the human, and are typically more fluid (think: software engineering, marketing). The role of computers in a carbon-based organization is to augment the knowledge workers, not constrain them. The tasks and challenges of a security system are far different, and require greater flexibility -- after all, employees are to be supported when engaging in novel activities designed to grow the business.

Consider a specific problem: data leakage. In a silicon-based organization, this is a (relatively) simple problem. Users can be restricted in exporting data from their systems, consumer technology more advanced than pen and paper can be banned from the facility, and all work must be done in the facility. Layer on just-in-time access control (restricting a user to only access records that they have been assigned to), and you've limited the leakage ... to what a user can remember or write down. And that's still a problem: twenty years ago, I worked in a company that used social security numbers as the unique identifier for its employees. Two decades later, a handful of those numbers are still rattling around in my head, a deferred data leakage problem waiting to happen.

Now compare that simple scenario against a knowledge worker environment. People are very mobile, and need to access company data everywhere. Assignments show up by word of mouth, so users need quick access to sources of data they had not seen before. Users are on multiple platforms, even when performing the same job, so system and application management is a challenge. Interaction with customers and partners happens rapidly, with sensitive data flying across the wires. Trying to prevent data leakage in this world is a Herculean task. Likely, given the impossibility of the task, most security implementations here will reduce business flexibility, without significantly reducing business risk.

But what can the enterprising security manager do? First, understand your business. If it's a silicon-based organization, you're in luck. Most security vendors, consultants, and best practices guides are looking to help you out (and take some of your money while doing so). If, on the other hand, you're in a carbon-based business, you've got a much harder task ahead of you. Most solutions wont help a lot out of the box, and risk acceptance may be so endemic to your organizational mindset that changing it may well feel impossible. You'll need to focus on understanding and communicating risk, and designing novel solutions to problems unique to your business. Sounds hard, but take heart: it's not like the silicon-based security team is going to get it right, either.

Contracting the Common Cloud

After attending CSO Perspectives, Bill Brenner has some observations on contract negotiations with SaaS vendors. While his panel demonstrated a breadth of customer experience, it was, unfortunately, lacking in a critical perspective: that of a cloud provider.

Much of the point of SaaS, or any cloud service, in the economy of scale you get; not just in capacity, but also in features. You're selecting from the same set of features that every other customer is selecting from, and that's what makes it affordable. And that same set of features needs to extend up into the business relationship. As the panel noted, relationships around breach management, data portability, and transport encryption are all important, but if you find yourself arguing for a provider to do something it isn't already, you're likely fighting a Sisyphean battle.

But how did a customer get to that point? Enterprises typically generate their own control frameworks, in theory beginning from a standard (like ISO 27002), but then redacting out the inapplicable (to them), tailoring controls, and adding in new controls to cover incidents they've had in the past. And when they encounter a SaaS provider who isn't talking about security controls, the natural tendency is to convert their own control framework into contractual language. Which leads to the observations of the panel participants: it's like pulling teeth.

A common request I've seen is the customer who wants to attach their own security policy - often a thirty to ninety page document - as a contract addendum, and require the vendor to comply with it, "as may be amended from time to time by Customer". And while communicating desires to your vendors is how they'll decide to improve, no cloud vendor is going to be able to satisfy that contract.

Instead, vendors need to be providing a high-water mark of business and technology capabilities to their customer base. To do this, they should start back from those original control frameworks, and not only apply them to their own infrastructure, but evaluate the vendor-customer interface as well. Once implemented, these controls can then be packaged, both into the baseline (for controls with little or no variable cost), and into for-fee packages. Customers may not want (or want to pay for) all of them, but the vendors need to be ahead of their customers on satisfying needs. Because one-off contract requirements don't scale. But good security practices do.

The Adaptive Persistent Threat

Much ado has been made of the "Advanced Persistent Threat". Unfortunately, pundits look at the ease of some of the attacks, and get hung up on the keyword, "Advanced." How do we know the adversary is so advanced, if he can succeed using such trivial attacks? The relative skill of the adversary is actually uninteresting; it is his persistence.

Many threat models focus on vulnerability and exposures. This creates an assumption of an ephemeral adversary; one who has a limited toolbox, and will attack at random until he finds an ill-defended target. This leads to planning to simply not be the easiest target ("Do you have to be faster than a bear to outrun it? No, you just need to be faster than the other guy. Or tie his shoelaces together").

Unfortunately, in a world of persistent threats, this may leave you open to other attacks. Consider the difference between protecting yourself against a random mugging and dealing with a stalker. Even if a stalker is relatively primitive, they will adapt to your defenses, and present a very real threat.

So let's drop the "Advanced" and replace it with "Adaptive": the "Adaptive Persistent Threat". In the face of this APT, unless you know yourself to be invulnerable (implausible), you should worry also about secondary targets. Once the APT has penetrated your first line of defenses, what can they do to you? How do you defend yourself?

Why is PCI so successful?

While at the RSA Conference, I took an afternoon out to head over to Bsides, where I participated in a panel (video in two parts) looking at the PCI Data Security Standard, and its impact on the industry. The best comment actually came to me in email afterward:

It's worth remembering that, no matter what your opinion of
PCI, the simple fact of the panel discussion today says something
impressive about the impact the standard has had, and the quality of
the industry's response.  Other areas of infosec haven't matured
nearly as much, or as quickly.

Far more than any standard to date, PCI has improved the state of security across the board. Some of that is in its simplicity -- the bulk of the control objectives are clearly written, and easy to implement against. Some is in its applicability -- it crosses industries like few other standard. Even more, I think, is in its narrowness. Rather than trying to improve security everywhere (and failing), it focuses on one narrow piece of data, and aims to protect that everywhere.

This isn't to say that PCI is the best we could have. Far from it. But it's the best we have, and we should look at its model and learn for future compliance standards.

Why don't websites default to SSL/TLS?

When a client connects on TCP port 80 to a webserver, one of the first thing it sends is a line that tells the server what website it wants. This line looks like this:
This tells the webserver which configuration to use, and how to present content to the end-user. This effectively abstracts TCP and IP issues, and lets websites and webservers interact at the discretion of the owner. The designed user of HTTP is the administrator, who may need to host dozens of websites on a smaller number of systems.

Secure Socket Layer (SSL) and its successor, Transport Layer Security (TLS), on the other hand, were designed for exactly the opposite user. The designed user of HTTPS is the paranoid security academic, who doesn't even want to tell a server the hostname it is looking for (the fact that you were willing to tell a DNS server is immaterial). In essence, SSL requires that any server IP address can have only one website on it. When a client connects to a webserver on port 443, the first thing it expects is for the server to provide a signed certificate, that matches the hostname that the client has not yet sent. So if you connect to via SSL, you'll note you get back a different certificate: one for This is expected -- nothing on this hostname requires encryption. Similarly, for Akamai customers that are delivering whole-site content via Akamai on hostnames CNAMEd to Akamai's domain, attempts to access these sites via SSL will result in a certificate for being returned. (Aside: customers use our SSL Object Caching service to deliver objects on the hostname; customers who want SSL on their own hostnames use our WAA and DSA-Secure services).

The designs of HTTP and HTTPS are diametrically opposed, and the SSL piece of the design creates horrendous scaling problems. The server you're reading this from serves over 150,000 different websites. Those sites are actually loadbalanced across around 1600 different clusters of servers. For each website to have an SSL certificate on this network, we'd need to consume around 250 million IP addresses - or 5.75% of the IPv4 space. That's a big chunk of the 9% left as of today. Note that there isn't a strong demand to put most sites on SSL; this is just elucidating why, even if there were demand, the sheer number of websites today makes this infeasible.

Fortunately, there are paths to a solution.

Wildcard certificates
For servers that only serve hostnames in one domain, a wildcard certificate can help. If, for instance, in addition to, I had,, and, instead of needing four certificates (across those 1800 locations!), I could use a certificate for *, which would match all four domains, as well as future growth of hostnames.

You're still limited; a wildcard can only match one field, so a certificate for * wouldn't work on a site named Also, many security practitioners, operating from principles of least privilege, frown on wildcard certificates, as they can be used even for unanticipated sites in a domain.

Subject Alternate Name (SAN) certificates
A slightly more recent variety of certificate are the SAN certificates. In addition to the hostname listed in the certificate, an additional field lets you specify a list of valid hostnames for that certificate (If you look closely, the certificate on this host has a SAN field set, which includes both and * This permits a server to have multiple, disparate hostnames on one interface.

On the downside, you still only have one certificate, which is going to get larger and larger the more hostnames you have (which hurts performance). It also ties all of those hostnames into one list, which may present brand and security issues to some enterprises.

Server Name Indication (SNI)
The long term solution is a feature for TLS called Server Name Indication (SNI). This extension calls for the client to, as part of the initial handshake, indicate to the server the name of the site it is looking for. This will permit a server to select the appropriate one from its set of SSL certificates, and present that.

Unfortunately, SNI only provides benefit when everyone supports it. Currently, a handful of systems don't support SNI, most notably Windows XP and IIS. And those two major components are significant: XP accounts for 55-60% of the browser market, and IIS looks to be around 24%. So it'll be a while until SNI is ready for primetime.

The Designed User

One of the things I've always liked about Apple technology is that every system feels like it was designed for a specific individual. The more you're like that individual, the more you like their technology. This isn't unique to Apple -- most technology capabilities are designed for a specific problem space -- Apple is just clearer about it. As a security professional, I like to understand who a specific technology is designed for ("The Designed User") as part of assessing risks involved.

As an example of designed users, take the new Twitter "retweet" functionality. (For those of you new to Twitter: Twitter permits people to post 140-character tweets. Interesting tweets are often "retweeted" by prepending "RT @username (original tweet here)", sometimes with some commentary appended. Twitter has another setting: whenever someone puts your username into a tweet, you see it.) The new retweet functionality, much maligned, allows a single click to retweet a tweet. The originating user does not see the new tweet.

The "old" retweet function -- really, a use created by users -- is perfect for the networking user. It often gets used to make a comment on someone else's tweet, while rebroadcasting it. I want to see every time someone retweets something I said (really, it doesn't happen that often). But I'm not the target of the new functionality: celebrities are. A large number of retweets are celebrity tweets being rebroadcast by their followers. If you're in that network, you want to minimize how many times you see the same retweet in your timeline. For those users, the new capability is easier, and far more preferred.

With any capability, we should always ask who the intended audience is as part of understanding the design space the developers were in. This may help us understand why certain security tradeoffs were chosen.

Evolution of DDoS

Last week, I spent some time with Bill Brenner (CSO Magazine) discussing DDoS attacks, and some insight into attacks over the last ten years. He put our discussion into a podcast.

Interview at ThreatChaos

Over the holidays, I spent some time on the phone with Richard Stiennon, talking a bit about Akamai and DDoS. The interview is up over at ThreatChaos.