Closing the Skills Gap

This morning, I sat in on a panel titled “Closing the Cybersecurity Skills Gap.” Javvad Malik has curated a collection of tweet observations; I thought I’d expand and share a few of my own observations:

The “skills gap” that recruiters see isn’t the right one. Often, we hear about the skills gap from recruiters in the context of “I couldn’t find candidates that met your requirements.” But if the requirements included, “15 years of experience securing Windows7,” we don’t have a skills gap, we have a problem writing job descriptions.

An often missing skill is relating to the business. Our jobs as security professionals often puts us at odds with the business. Why? Because we strive to be the “conscience of the business” and stop it from taking certain risks. Our job is to help the business take risks - but to do so more wisely, through actionable knowledge. Since our business partners are the ones making the decisions to take risks, they are the ones who need to understand risks that impact their decisions.

Think systematically. Many of our training programs focus on providing building block skills; this gives people hammers (so that all problems look like nails). An underdeveloped skill is the ability to think holistically about systems.

Problem solving is a needed skill. Not simply identifying how to solve a problem; actually digging in and solving a problem is a critical skill. It might require building servers; installing applications; designing processes; analyzing data, and a dozen other sundry capabilities.

Communications and translations are key. Every job function has its own jargon - and being able to communicate in the jargon of the business is a critical capability. Having the ability to quickly learn about how current events or new technologies will affect your business, and then provide coherent summaries and advice to the business will be extremely helpful.

Be kind. The security profession has often celebrated being unkind and hurtful to each other and our business partners (think of The Wall of Sheep). Instead, we should be trying to understand them; to be helpful to them; and to understand how we can improve their world.

And some thoughts from prior blogposts: Certification isn’t a marker of mastery. Think about measuring value. Are you applying your skills to just compliance, or solving security problems like awareness training in novel ways?

Cognitive Injection: A reading list

Context: Tuesday, February 25th, I’m presenting “Cognitive Injection: Reprogramming the Situation-Oriented Human OS” at RSAC in Moscone West, Room 3005 at 4 pm (Pacific). My slides are here.

I’ve formed my opinion about how the human brain works with the assistance of some great contributors. Some of them are humans I hang out with, but many of them are authors and researchers; in the interest of helping others come to the same, or better, understanding, here’s a short reading list:
  • Daniel Kahneman; Thinking, Fast and Slow
  • James Reason; Human Error
  • Atul Gawande; The Checklist Manifesto
  • Christopher Chabris and Daniel Simons; The Invisible Gorilla
  • Sam Peltzman,;“The Effects of Automobile Safety Regulation”, Journal of Political Economy, 1975. (see also: The Peltzman Effect)

Whither HSMs (in the cloud)

Hardware Security Modules (HSMs) are physical devices attached or embedded in another computer to handle various cryptographic functions. HSMs are supposed to provide both physical and logical protection of the cryptographic material stored on the HSM while handling cryptographic functions for the computer to which they are attached.

As websites move to the cloud, are HSMs the right way to achieve our goals?

Before we talk about goals, it is useful to consider a basic model for talking about them. Our Safety team often uses the following model to consider whether a system is safe:
  • What are the goals we are trying to achieve? (Or, in Leveson's STPA hazard-oriented view, what are the accidents/losses which you wish to prevent?)
  • What are the adversaries we wish to defeat?
  • What are the powers available to those adversaries? What *moves* are available to them?
  • And finally, what controls inhibit adversaries' use of their powers, thus protecting our goals?
Our hazards (or unacceptable losses) are:
  • An adversary can operate a webserver that pretends to be ours;
  • An adversary can decrypt SSL traffic; and
  • An adversary can conduct a man-in-the-middle attack on our SSL website.
In the protection of SSL certificates in the cloud, it would seem that our goals are two-fold:
  • Keep the private key *secret* from third parties; and
  • Prevent unauthorized and undetected use of the key in cryptographic functions. While SSL certificate revocation is a weak control (many browsers do not check for revocation), it is that which generally constrains this goal to both unauthorized *and* undetected; a detected adversary can be dealt with through revocation.
I could argue that the first is a special case of the second, except that I want to distinguish between "cryptographic functions over the valid lifetime of the certificate" and "cryptographic functions after the certificate is supposed to be gone."

As an aside, I could also argue that these goals are insufficient; after all, except for doing man in the middle attacks, *any* SSL certificate signed by any of the many certificate authorities in the browser store would enable an adversary to cause the first of the losses. HSMs don't really help with that problem.

Given that caveat, what are the interesting adversaries? I propose four "interesting" adversaries, mostly defined by their powers:
  • The adversary who has remotely compromised a server;
  • The adversary who has taken physical control of a server which is still online;
  • The adversary who has taken physical control of a server at end of life; and
  • The adversary who has been given administrative access to a system.
The moves available to these adversaries are clear:
  • Copy key material (anyone with administrative access);
  • Change which key material or SSL configuration we'll use (thus downgrading the integrity of legitimate connections)
  • Escalate privileges to administrative access (anyone with physical or remote access); and
  • Make API calls to execute cryptographic functions (anyone with administrative access).
What controls will affect these adversaries?
  • Use of an HSM will inhibit the copying of keying material;
  • Use of revocation will reduce the exposure of copied keying material;
  • System-integrated physical security (systems that evaluate their own cameras and cabinets, for instance) inhibit escalation from physical access to administrative access;
  • Auditing systems inhibits adversary privilege escalation;
  • Encrypting keying material, and only providing decrypted versions to audited, online systems inhibits adversaries with physical control of systems.
What I find interesting is that for systems outside the physical purview of a company, HSMs may have a subtle flaw: since HSMs must provide an API to be of use, *that API remains exposed to an adversary who has taken possession of an HSM*. This may be a minor issue if an HSM is in a server in a "secure" facility, it becomes significant in distributed data centers. On the contrary, the control system which includes tightly coupled local physical security, auditing, and software encryption may strike a different balance: slightly less stringent security against an adversary who can gain administrative access (after all, they can likely copy the keys), in exchange for greater security against adversaries who have physical access.

This isn't to say that this is the only way to assemble a control system to protect SSL keys; merely that a reflexive jump to an HSM-based solution may not actually meet the security goals that many companies might have.

(Full disclosure: I’m the primary inventor of Akamai’s SSL content delivery network, which has incorporated software-based key management for over a decade.)
cross posted on The Akamai Blog.

Cognitive Injection

Context: I’m giving a talk today at noon at DerbyCon, entitled “Cognitive Injection: Reprogramming the Situation Oriented Human OS”. Slides are here.

It's a trope among security professionals that other humans - mere mundanes - don't 'get' security, and make foolish decisions. But this is an easy out, and a fundamental attribution error. Everyone has different incentives, motivators, and even perceptions of the world. By understanding this -- and how the human wetware has evolved over the the last fifty thousand years or so -- we can redesign our security programs to better manipulate people.

Assessment of the BREACH vulnerability

The recently disclosed BREACH vulnerability in HTTPS enables an attack against SSL-enabled websites. A BREACH attack leverages the use of HTTP-level compression to gain knowledge about some secret inside the SSL stream, by analyzing whether an attacker-injected "guess" is efficiently compressed by the dynamic compression dictionary that also contains the secret. This is a type of an attack known as an oracle, where an adversary can extract information from an online system by making multiple queries to it.

BREACH is interesting in that it isn't an attack against SSL/TLS per se; rather, it is a way of compromising some of the secrecy goals of TLS by exploiting an application that will echo back user-injected data on a page that also contains some secret (a good examination of a way to use BREACH is covered by Sophos). There are certain ways of using HTTPS which make this attack possible, and others which merely make the attack easier.

Making attacks possible

Impacted applications are those which:
  • Include in the response body data supplied in the request (for instance, by filling in a search box);
  • Include in the response some static secret (token, session ID, account ID); and
  • Use HTTP compression.
For each of these enabling conditions, making it untrue is sufficient to protect a request. Therefore, never echoing user data, having no secrets in a response stream or disabling compression are all possible fixes. However, making either of the first two conditions false is likely infeasible; secrets like Cross-Site Request Forgery (CSRF) tokens are often required for security goals, and many web experiences rely on displaying user data (hopefully sanitized to prevent application injection attacks). Disabling compression is possibly the only "foolproof" and straightforward means of stopping this attack simply - although it may be sufficient to only disable compression on responses with dynamic content. Responses which do not change between requests do not contain a user-supplied string, and therefore should be safe to compress.

Disabling compression is likely to be expensive - some back-of-the-envelope numbers from Guy Podjarny, Akamai's CTO of Web Experience, suggest a significant performance hit. HTML compresses by a factor of around 6:1 - so disabling compression will increase bandwidth usage and latency accordingly. For an average web page, excluding HTML compression will likely increase the time to start rendering the page by around half a second for landline users, with an even greater impact for mobile users.

Making attacks easier

Applications are more easily attacked if they:
  • Have some predictability around the secrets; either by prepending fixed strings, or having a predictable start or end;
  • Are relatively static over time for a given user; and
  • Use a stream cipher.
This second category of enablers presents a greater challenge to evaluate solutions. Particularly challenging is the question of how much secrecy is gained and at what cost from each of them.

Altering secrets between requests is an interesting challenge - a CSRF token might be split into two dynamically changing values, which “add” together to form the real token (x * y = CSRF token). Splitting the CSRF token differently for each response ensures that an adversary can't pin down the actual token with an oracle attack. This may work for non-human parseable tokens, but what if the data being attacked is an address, phone number, or bank account number? Splitting them may still be possible (using JavaScript to reassemble in the browser), but the applications development cost to identify all secrets, and implement protections that do not degrade the user experience, seems unachievable.

Altering a page to be more dynamic, even between identical requests, seems possibly promising, and is certainly easier to implement. However, the secrecy benefit may not be as straightforward to calculate - an adversary may still be able to extract from the random noise some of the information they were using in their oracle. A different way to attack this problem might not be by altering the page, but by throttling the rate at which an adversary can force requests to happen. The attack still may be feasible against a user who is using wireless in a cafe all day, but it requires a much more patient adversary.

Shifting from a stream cipher to a block cipher is a simple change which increases the cost of setting up a BREACH attack (the adversary now has to “pad” attack inputs to hit a block size, rather than getting an exact response size). There is a slight performance hit (most implementations would move from RC4 to AES128 in TLS1.1).

Defensive options

What options are available to web applications?
  • Evaluate your cipher usage, and consider moving to AES128.
  • Evaluate whether supporting compression on dynamic content is a worthwhile performance/secrecy tradeoff.
  • Evaluate applications which can be modified to reduce secrets in response bodies.
  • Evaluate rate-limiting. Rate-limiting requests may defeat some implementations of this attack, and may be useful in slowing down an adversary.
How can Akamai customers use their Akamai services to improve their defenses?
  • You can contact your account team to assist in implementing many of these defenses, and discuss the performance implications.
  • Compression can be turned off by disabling compression for html objects. The performance implications of this change should be well understood before you make it, however (See the bottom of this post for specifics on one way to implement this change, limited only to uncacheable html pages, in Property Manager).
  • Rate-limiting is available to Kona customers.
  • Have your account team modify the cipher selections on your SSL properties.

Areas of exploration

There are some additional areas of interest that bear further research and analysis before they can be easily recommended as both safe *and* useful.
  • Padding response sizes is an interesting area of evaluation. Certainly, adding a random amount of data would at least help make the attack more difficult, as weeding out the random noise increase the number of requests an adversary would need to make. Padding to multiples of a fixed-length is also interesting, but is also attackable, as the adversary can increase the size of the response arbitrarily until they force the response to cross an interesting boundary. A promising thought from Akamai's Chief Security Architect Brian Sniffen is to pad the response by number of bytes based on the hash of the response. This may defeat the attack entirely, but merits further study.
  • An alternative to padding responses is to split them up. Ivan Ristic points us to Paul Querna's proposal to alter how chunked encoding operates, to randomize various response lengths.
  • It may be that all flavors of this attack are HTTPS responses where the referrer is from an HTTP site. Limiting defenses to only apply in this situation may be fruitful - for instance, only disabling HTML compression on an HTTPS site if the referrer begins with "http://". Akamai customers with Property Manager enabled can make this change themselves (Add a rule: Set the Criteria to "Match All": "Request Header", "Referer", "is one of", "http://*" AND "Response Cacheability", "is" "no_store"; set the Behaviors to "Last Mile Acceleration (Gzip Compression)", Compress Response "Never”. This requires you to enable wildcard values in settings.).

crossposted at

Environmental Controls at Planetary Scale

A common set of security control objectives found in standard frameworks (ISO 27002, FedRAMP, et al) focus on environmental controls. These controls, which might focus on humidity sensors and fire suppression, are designed to maximize the mean time between critical failure (MTBCF) of the systems inside a data center. They are often about reliability, not safety1; fixating on over-engineering a small set of systems, rather than building in fault tolerance.

Is the cost worth the hassle? If you run one data center, then the costs might worthwhile - after all, it’s only a few capital systems, and a few basis point improvements in MTBCF will likely be worth that hassle (both in operational false positives as well as deployment cost). But what if you operate in thousands of data centers, most of them someone else’s? The cost multiplies significantly, but the marginal benefit significantly decreases - as any given data center improvement only affects such a small portion of your systems. Each data center in a planetary scale environment is now as critical to availability as a power strip is to a single data center location. Mustering an argument to monitor every power strip would be challenging; a better approach is to have a drawer full of power strips, and replace ones that fail.

The same model applies at the planetary scale: with thousands of data centers all over the world (in most of which the operators already have other incentives to take care of environmental monitoring), a much more effective approach is to continue to focus on regional failover (data centers, metro regions, and countries go offline all the time), and only worry about issues within a data center when they become a noticeable problem.

1 Leveson, Nancy. Section 2.1, "Confusing Safety with Reliability", Engineering a Safer World, pp 7-14

crossposted at

A Brief History of Cryptography

As part of an educational video series, here’s a brief history of cryptography.

DNS reflection defense

Recently, DDoS attacks have spiked up well past 100 Gbps several times. A common move used by adversaries is the DNS reflection attack, a category of Distributed, Reflected Denial of Service (DRDos) attack. To understand how to defend against it, it helps to understand how it works.

How DNS works

At the heart of the Domain Name System are two categories of name server: the authoritative name server, which is responsible for providing authoritative answers to specific queries (like, which is one of the authoritative name servers for the domain), and the recursive name server, which is responsible for answering any question asked by a client. Recursive name servers (located in ISPs, corporations, and data centers around the world) query the appropriate authoritative name servers around the Internet, and return an answer to the querying client. An open resolver is a category of resolver that will answer recursive queries from any client, not just those local to them. Because DNS requests are fairly small and lightweight, DNS primarily uses the Universal Datagram Protocol (UDP), a stateless messaging system. Since UDP requests can be sent in a single packet, the source address are easily forgeable with any address desired by the true sender.

DNS reflection

A DNS reflection attack takes advantage of three things: the forgeability of UDP source addresses, the availability of open resolvers, and the asymmetry of DNS requests and responses. To conduct an attack, an adversary sends a set of DNS queries to open resolvers, altering the source address on their requests to be those of their chosen target. The requests are designed to have much larger responses (often, using an ANY request, a 64 byte request yields a 512-byte response), thus resulting in the recursive name servers sending about 8 times as much traffic at the target as they themselves received. A DNS reflection attack can directly use authoritative name servers, but it requires more preparation and research, making requests specific to the scope of each DNS authority used.

Eliminating DNS reflection attacks

An ideal solution would obviously be to eliminate this type of attack, rather than every target needing to defend themselves. Unfortunately, that’s challenging, as it requires significant changes by infrastructure providers across the Internet.


No discussion of defending against DRDoS style attacks is complete without a nod to BCP38. These attacks only work because an adversary, when sending forged packets, has no routers upstream filtering based on the source address. There is rare need to permit an ISP user to send packets claiming to originate in another ISP; if BCP38 were adopted and implemented in a widespread fashion, DRDoS would be eliminated as an adversarial capability. That’s sadly unlikely, as BCP38 enters its 14th year; the complexity and edge cases are significant.

The open resolvers

While a few enterprises have made providing an open resolver into a business (OpenDNS, GoogleDNS), many open resolvers are either historical accidents, or resulting from incorrect configuration. Even MIT has turned off open recursion on its high-profile name servers.
Barring that, recursive name servers should implement rate limiting, especially on infrequent request types, to reduce the multiplication of traffic that adversaries can gain out of them.


Until ISPs and resolver operators implement controls to limit how large attacks can become, attack targets must defend themselves. Sometimes, attacks are targeted at infrastructure (like routers and name servers), but most often they are being targeted at high-profile websites operated by financial services firms, government agencies, retail companies, or whoever has caught the eye of the attacker this week.
An operator of a high-profile web property can take steps to defend their front door. The first step, of course, should be to find their front door; and to understand what infrastructure it relies on. And then they can evaluate their defenses.


The first line of defense is always capacity. Without enough bandwidth at the front of your defenses, nothing else matters. This needs to be measurable both in raw bandwidth, as well as in packets per second, because hardware often has much lower bandwidth capacity as packet sizes shrink. Unfortunately, robust capacity is now measurable in the 300+ gigabits per second, well beyond the resources of the average datacenter. However, attacks in the 3-10 gigabit per second range are still common, and well within the range of existing datacenter defenses.


For systems that aren’t DNS servers themselves, filtering out DNS traffic as far upstream as possible is a good solution, but certainly at a border firewall. One caveat - web servers often need to make DNS queries themselves, so ensure that they have a path to do so. In general, the principal of “filter out the unexpected” is a good filtering strategy.

DNS server protection

Since DNS servers have to process incoming requests (an authoritative name server has to respond to all of the recursive resolvers around the Internet, for instance), merely filtering DNS traffic upstream isn’t an option. So what is perceived as a network problem by non-DNS servers becomes an application problem for the DNS server. Defenses may no longer be simple “block this” strategies; rather, defense can take advantage of application tools to provide different defenses.


While the total number of authoritative DNS server IP addresses for a given domain is limited (while 13 should fit into the 512-byte DNS response packet, generally, 8 is a reasonable number), many systems use nowhere near the limit. Servers should be diversified, located in multiple networks and geographies, ensuring that attacks against two name servers aren’t traveling across the same links.


Since requests come in via UDP, anycasting (the practice of having servers responding on the same IP address from multiple locations on the internet) is quite practical. Done at small scale (two to five locations), this can provide significant increases in capacity, as well as resilience to localized physical outages. However, DNS also lends itself to architectures with hundreds of name server locations sprinkled throughout the internet, each localized to only provide service to a small region of the Internet (possibly even to a single network). Adversaries outside these localities have no ability to target the sprinkled name servers, which continue to provide high quality support to nearby end users.


Based on Akamai’s experience running popular authoritative name servers, 95% of all DNS traffic originates from under a million popular name server IP addresses (to get 99% requires just under 2 million IP addresses). Given that the total IPv4 address space is around 4.3 billion IP addresses, name servers can be segregated; a smaller number to handle the “unpopular” name servers, and a larger amount to handle the popular name servers. Attacks that reflect of unpopular open resolvers thus don’t consume the application resources providing quality of service to the popular name service.

Response handling

Authoritative name servers should primarily see requests, not responses. Therefore, they should be able to isolate, process, and discard response packets quickly, minimizing impact to resources engaged in replying to requests. This isolation can also apply to less frequent types of request, such that when a server is under attack, it can devote resources to requests that are more likely to provide value.

Rate Limiting

Traffic from any name server should be monitored to see if it exceeds reasonable thresholds, and, if so, aggressively managed. If a name server typically sends a few requests per minute, having name servers not answer most requests from a name servers requesting dozens of time per second (these thresholds can and should be dynamic). This works because of the built in fault tolerance of DNS; if a requesting name server doesn’t see a quick response, it will send another request, often to a different authoritative name server (and deprioritizing the failed name server for future requests).

As attacks grow past the current few hundred gigabit-per-second up to terabit-per-second attacks, robust architectures will be increasingly necessary to maintains a presence on the Internet in the face of adversarial action.

crossposted at

SOURCE Boston Talk

This month I gave the Thursday keynote (slides) at SOURCE Boston 2013. SOURCE is one of my favorite conferences to be at - primarily because of the high density of passionate security practitioners in attendance.

There is some nice coverage of some other Akamai talks given at SOURCE as well. If you think this works sounds cool, we’re hiring!

How big is 300 Gbps, really?

The 300 Gbps attack this week against SpamHaus certainly seems epic. But how big is it, really? When we think about an attack an Akamai, we think about three things: the attacker’s capacity, their leverage, and the target’s capacity. And when we think about leverage, it’s really comprised of two smaller pieces: how much cost efficiency the attacker expects to get, and how the target’s resilience mitigates it.

300 Gbps isn’t that bad when it’s restricted to reflected DNS traffic - if you have enough capacity to ingest the packets, they’re pretty trivial to drop, and, until your network cards fill up, are less effective than a SYN flood. So why would an attacker resort to such an inefficient attack? The attacker likely doesn’t have 300 Gbps in their botnet - they probably have somewhere in the range of 30 to 60 Gbps. Attacks through DNS resolvers are amplified - so the attacker can create a larger attack than they might have otherwise, at the cost of reducing their leverage.

In comparison the BroBot botnets are routinely tossing around 30 Gbps attacks, with peaks upwards of 80 Gbps. Because they’re willing to sacrifice their hosts, they have a wider range of attacks available to them. Commonly, they send HTTPS request floods - requiring their targets to negotiate full SSL connections, parse an HTTP request, and determine whether they’ll deliver a reply or not. BroBot could certainly throw around a bit more bandwidth with DNS reflection - but against most of their targets, it would have less effect than some of their current tactics.

It’s hard to compare the two. If you have less than 60 Gbps of raw bandwidth lying around, they’re both the same (you’ll succumb either way). If you have more than 60 and less than 300 Gbps, BroBot is more palatable, although you need a lot more CPU to handle it. But above 300Gbps of bandwidth? The attack on SpamHaus is much, much easier to deal with.