Assessment of the BREACH vulnerability

The recently disclosed BREACH vulnerability in HTTPS enables an attack against SSL-enabled websites. A BREACH attack leverages the use of HTTP-level compression to gain knowledge about some secret inside the SSL stream, by analyzing whether an attacker-injected "guess" is efficiently compressed by the dynamic compression dictionary that also contains the secret. This is a type of an attack known as an oracle, where an adversary can extract information from an online system by making multiple queries to it.

BREACH is interesting in that it isn't an attack against SSL/TLS per se; rather, it is a way of compromising some of the secrecy goals of TLS by exploiting an application that will echo back user-injected data on a page that also contains some secret (a good examination of a way to use BREACH is covered by Sophos). There are certain ways of using HTTPS which make this attack possible, and others which merely make the attack easier.

Making attacks possible

Impacted applications are those which:
  • Include in the response body data supplied in the request (for instance, by filling in a search box);
  • Include in the response some static secret (token, session ID, account ID); and
  • Use HTTP compression.
For each of these enabling conditions, making it untrue is sufficient to protect a request. Therefore, never echoing user data, having no secrets in a response stream or disabling compression are all possible fixes. However, making either of the first two conditions false is likely infeasible; secrets like Cross-Site Request Forgery (CSRF) tokens are often required for security goals, and many web experiences rely on displaying user data (hopefully sanitized to prevent application injection attacks). Disabling compression is possibly the only "foolproof" and straightforward means of stopping this attack simply - although it may be sufficient to only disable compression on responses with dynamic content. Responses which do not change between requests do not contain a user-supplied string, and therefore should be safe to compress.

Disabling compression is likely to be expensive - some back-of-the-envelope numbers from Guy Podjarny, Akamai's CTO of Web Experience, suggest a significant performance hit. HTML compresses by a factor of around 6:1 - so disabling compression will increase bandwidth usage and latency accordingly. For an average web page, excluding HTML compression will likely increase the time to start rendering the page by around half a second for landline users, with an even greater impact for mobile users.

Making attacks easier

Applications are more easily attacked if they:
  • Have some predictability around the secrets; either by prepending fixed strings, or having a predictable start or end;
  • Are relatively static over time for a given user; and
  • Use a stream cipher.
This second category of enablers presents a greater challenge to evaluate solutions. Particularly challenging is the question of how much secrecy is gained and at what cost from each of them.

Altering secrets between requests is an interesting challenge - a CSRF token might be split into two dynamically changing values, which “add” together to form the real token (x * y = CSRF token). Splitting the CSRF token differently for each response ensures that an adversary can't pin down the actual token with an oracle attack. This may work for non-human parseable tokens, but what if the data being attacked is an address, phone number, or bank account number? Splitting them may still be possible (using JavaScript to reassemble in the browser), but the applications development cost to identify all secrets, and implement protections that do not degrade the user experience, seems unachievable.

Altering a page to be more dynamic, even between identical requests, seems possibly promising, and is certainly easier to implement. However, the secrecy benefit may not be as straightforward to calculate - an adversary may still be able to extract from the random noise some of the information they were using in their oracle. A different way to attack this problem might not be by altering the page, but by throttling the rate at which an adversary can force requests to happen. The attack still may be feasible against a user who is using wireless in a cafe all day, but it requires a much more patient adversary.

Shifting from a stream cipher to a block cipher is a simple change which increases the cost of setting up a BREACH attack (the adversary now has to “pad” attack inputs to hit a block size, rather than getting an exact response size). There is a slight performance hit (most implementations would move from RC4 to AES128 in TLS1.1).

Defensive options

What options are available to web applications?
  • Evaluate your cipher usage, and consider moving to AES128.
  • Evaluate whether supporting compression on dynamic content is a worthwhile performance/secrecy tradeoff.
  • Evaluate applications which can be modified to reduce secrets in response bodies.
  • Evaluate rate-limiting. Rate-limiting requests may defeat some implementations of this attack, and may be useful in slowing down an adversary.
How can Akamai customers use their Akamai services to improve their defenses?
  • You can contact your account team to assist in implementing many of these defenses, and discuss the performance implications.
  • Compression can be turned off by disabling compression for html objects. The performance implications of this change should be well understood before you make it, however (See the bottom of this post for specifics on one way to implement this change, limited only to uncacheable html pages, in Property Manager).
  • Rate-limiting is available to Kona customers.
  • Have your account team modify the cipher selections on your SSL properties.

Areas of exploration

There are some additional areas of interest that bear further research and analysis before they can be easily recommended as both safe *and* useful.
  • Padding response sizes is an interesting area of evaluation. Certainly, adding a random amount of data would at least help make the attack more difficult, as weeding out the random noise increase the number of requests an adversary would need to make. Padding to multiples of a fixed-length is also interesting, but is also attackable, as the adversary can increase the size of the response arbitrarily until they force the response to cross an interesting boundary. A promising thought from Akamai's Chief Security Architect Brian Sniffen is to pad the response by number of bytes based on the hash of the response. This may defeat the attack entirely, but merits further study.
  • An alternative to padding responses is to split them up. Ivan Ristic points us to Paul Querna's proposal to alter how chunked encoding operates, to randomize various response lengths.
  • It may be that all flavors of this attack are HTTPS responses where the referrer is from an HTTP site. Limiting defenses to only apply in this situation may be fruitful - for instance, only disabling HTML compression on an HTTPS site if the referrer begins with "http://". Akamai customers with Property Manager enabled can make this change themselves (Add a rule: Set the Criteria to "Match All": "Request Header", "Referer", "is one of", "http://*" AND "Response Cacheability", "is" "no_store"; set the Behaviors to "Last Mile Acceleration (Gzip Compression)", Compress Response "Never”. This requires you to enable wildcard values in settings.).

crossposted at

Environmental Controls at Planetary Scale

A common set of security control objectives found in standard frameworks (ISO 27002, FedRAMP, et al) focus on environmental controls. These controls, which might focus on humidity sensors and fire suppression, are designed to maximize the mean time between critical failure (MTBCF) of the systems inside a data center. They are often about reliability, not safety1; fixating on over-engineering a small set of systems, rather than building in fault tolerance.

Is the cost worth the hassle? If you run one data center, then the costs might worthwhile - after all, it’s only a few capital systems, and a few basis point improvements in MTBCF will likely be worth that hassle (both in operational false positives as well as deployment cost). But what if you operate in thousands of data centers, most of them someone else’s? The cost multiplies significantly, but the marginal benefit significantly decreases - as any given data center improvement only affects such a small portion of your systems. Each data center in a planetary scale environment is now as critical to availability as a power strip is to a single data center location. Mustering an argument to monitor every power strip would be challenging; a better approach is to have a drawer full of power strips, and replace ones that fail.

The same model applies at the planetary scale: with thousands of data centers all over the world (in most of which the operators already have other incentives to take care of environmental monitoring), a much more effective approach is to continue to focus on regional failover (data centers, metro regions, and countries go offline all the time), and only worry about issues within a data center when they become a noticeable problem.

1 Leveson, Nancy. Section 2.1, "Confusing Safety with Reliability", Engineering a Safer World, pp 7-14

crossposted at