Why isn't HTTPS everywhere yet?

Encryption. We all like it and want more of it. Why isn't HTTPS everywhere yet?

Is it certificates?

The first and most commonly cited barrier to HTTPS everywhere is the cost (both time and effort) of obtaining, configuring, presenting and maintaining a valid certificate. You must find a certificate authority, prove your identity, pay for a certificate, set it up on your server (you probably need to have a dedicated IPv4 address to support old clients that don't do Server Name Indication, but that's a different issue) and renew it before it expires.

Most proposals for "opportunistic" encryption are targeted at this level of the problem, and amount to: "The NSA is recording all of our traffic, so why don't we just encrypt even if we don't have a certificate?" The goal of such proposals is to raise the cost of passive bulk surveillance, not to address more sophisticated or targeted attacks by active network attackers.

I appreciate the spirit of this solution, but I have come to believe it is actually targeting the wrong obstacles. First, because this flavor of OE lacks authentication even when it appears to succeed, it doesn't offer any guarantees that other resources can build meaningful security contracts with. Since users and resource authors who also care about active attackers can never rely on it, even optimistically, it doesn't remove any obstacles or provide a meaningfully improved upgrade path to a Web that is comprehensively secure against both active and passive adversaries. Second, I don't think the certificate problem is actually the biggest challenge blocking HTTPS everywhere anymore.

The fine folks over at Let's Encrypt already realized that the certficate problem is mostly amenable to automation, and that automating the issuance, installation, configuration, deployment and renewal of certificates and TLS on a relatively few of the major servers+platforms can cover a substantial supermajority of the Internet. This is amazing work, and while there is still much of that race left to be run, I think we can optimistcally call it a solved problem.

Now we have a certificate, we can turn on HTTPS, right?

Well, maybe. But probably not. If all of the HTML resources you serve have subresources (images, scripts, etc.) loaded from the same host AND you used only relative urls (e.g. "/foo.jpg" instead of "http://example.com/foo.jpg") then you are good to go! Otherwise things are probably broken for you at your "https://" URLs.

Wait, what? HTTPS is supposed to be good. Why is everything broken?

It's broken because you almost certainly have Mixed Content in your pages.

What is Mixed Content and why should I care?

Mixed Content is the term that has arisen out of practice for the situation when a document loaded over https includes content loaded over http.

Consider the following set of resources, where you are the operator of "OriginA". Green circles represent resources avialable over both https and http schemes, red circles represent resources only available over http schemes. Dashed lines represent subresource includes made with an http scheme. Solid lines represent subresource includes made with an https scheme. Assume that all links are absolute, not relative. A red "X" indicates a resource load that will fail.

Now let's say you acquire a certificate and turn on https for OriginA. What does the resource graph look like now?

If you haven't updated absolute links in the HTML document, it will still be attempting to load all of its resources with an http scheme. Most web browsers block such loads or give negative UI feedback when they occur, and they have been getting more and more strict about it over time. The HTML resource at OriginA will be broken because attempts to load the three JS resources it depends on will be blocked, and it will be marked with a mixed content warning for the JPG, even if we didn't have the JS resources. Note that even though some resources are available over https, because the existing links use the http scheme, they will still fail.

Why do browsers block mixed content loads? Why can't I as a site operator make that decision?

In most of the Web security model, Origins (the scheme+host+port tuple of a URL) are authoritative for their own information. A document loaded from HTTPS can navigate the user to an insecure destination with sensitive data in the GET string or fragment, it can POST or postMessage() to insecure schemes or origins, and an https resource can receive GET, POST or onMessage() from documents loaded over http. So, if we don't have formal information flow controls on the Web, why is Mixed Content Blocking a thing? If I can POST, why can't I XHR?

It turns out there is one formal security property that browsers try to enforce on documents. That is a property first formulated as "Tranquility" by Bell and LaPadula in their 1973 integrity model. In simple terms, the tranquility enforced by web browsers is that a secure document will not become insecure while you are interacting with it.

Among all the complexity and potential pitfalls of Web security, browsers have come to the conclusion that there is only one semi-reliable and usable security indicator: the URL bar and HTTPS lock. If you type "https://" into the address bar, or at any point check to see if there is a lock icon for the document you are interacting with, the browser has made you a promise that the content is protected from threats involving a hostile network. Another way to look at it is that, the browser is conveying a promise to the user from the site operator which it is unwilling to let it reneg on. Site operators may also expect and rely on browsers to fulfill this promise, so one typo or missed link by a single developer somewhere doesn't undo all the hard work they've done to provide a secure experience for their users.

If your https document were to load a script, perform a fetch or even load an image over http, that promise would be broken. How broken and what the exact consequences would be might vary widely, but the browser isn't in a position to know, and doesn't want to impose the burden of that subtlety on the user, so they simply block. This is done not only to protect users who wouldn't otherwise be aware that, e.g., a webmail application which shows https in the address bar but includes script over http isn't safe to use in a coffeeshop in the hacker part of town, but also to highlight the same issue to content authors who might themselves otherwise miss this subtle point.

This is a good thing for users, but it puts site operators newly in posession of a certificate in a bit of difficulty when it comes to turning on https. All of their HTML resources which reference insecure content are going to break outright or alarm some population of users, either by showing a warning or simply not showing the lock they expect for https. This dependency problem, I assert, is the real cost barrier we must surmount in order to move the rest of the Web to 100% HTTPS.

Fixing Mixed Content

As soon as you have to fix Mixed Content, the cost of migrating to HTTPS starts to get real. Much more so than any kind of server configuration or acquisition of a certificate, removing mixed content is expensive and not always well-suited to automation because it often requires understanding all the possible content served by a host, modifying it (conditionally) and (possibly most expensively) testing to verify that everything works.

For a complex site, it's not as simple as running s/http/https/g across all of your resources, on disk or with a mod_rewrite rule. You might encounter http-schemed resources in many places: static content, dynamic content generated on the server, on the client, stored in databases (on the server or client), retrieved from other third party redirects, etc.

A new specification under development, (and already supported by Chrome and Firefox) Upgrade Insecure Requests, aims to ease this burden by automatically upgrading http subresource fetches to https, along with same-origin navigations.

Before

After Upgrade-Insecure-Requests

Upgrade-Insecure-Resources has helped OriginA here. One of its HTML resources is now free of mixed content because all of the link schemes have been transparently upgraded and all its remote dependencies are available over https. However, we still have one broken resource because the JS dependency from OriginB is still not available over https. This illustrates why, until all of their dependencies have upgraded to https, many sites are reluctant to offer any of their own content over https, to avoid presenting users with broken experiences and worrying error messages.

This leads to the unfortunate circumstance that the least accountable actors at the end of long dependency chains can hold back progress for everyone upstream. Cyclical dependencies (as certainly exist in the large scale structure of the web) can create deadlocks which totally prevent upgrades without coordination.

None of these sites can turn on https

To make matters even worse, it is not trivial to even determine if your dependencies are ready! Once you go HTTPS, errors will just start happening for your users, and you have no obvious way to catch them in advance. (A "default-src https:" Content Security Policy directive can tell you when things actually broke, but you can't easily compose it with an optimistic upgrade to test without actual breakage.)

For modern applications with complex client-side logic, serving large user bases, and using things like Real-Time-Bidding advertising networks, the difficulty of creating a reasonable simulation of traffic and user experience for test purposes is quite real. Just the set of domain names you reference may be emergent runtime behavior with substantial variance over time.

Breaking the deadlock

What we lack is an intermediate state between http and https. Ideally, such a state would have the following properties:

  1. Allow secure origins which depend on resources you serve to retrieve them in way which does not violate their secure tranquility.
  2. Do not force resources to make premature or unverified guarantees of secure tranquility / lack of mixed content.
  3. Be extremely low-cost and low-risk to deploy; ideally requiring zero content-level changes, only server configuration updates. (including possibly adding one or more http headers)
  4. Allow detection of dependencies which would violate secure tranquility / produce mixed content errors without negatively impacting the user experience.

Click on the figure below to see how introducing an intermediate state (indicated with blue) with these properties can break an upgrade deadlock cycle.

First, OriginB turns on "https-transitional" mode. This means that is resources are still not available at URLs with an https scheme, but are optimistically available over TLS with the full guarantees, including a valid certificate. This is essentially zero-cost for OriginB because it makes no new guarantees to users or browsers about the security state or tranquility of its own resources.

Now that OriginB's resources are available via "https-transitional", OriginA turns on https. It has an HTML file with an http dependency on the JS file at OriginB. A browser that knows about "https-transitional" can try to initiate a TLS connection to OriginB and ask for the resource with its original http scheme. If this optimistic upgrade fails, it would be treated as insecure and trigger mixed content blocking, so we haven't reduced the guarantees to users of OriginA. If it succeeds, all of the standard guarantees required for OriginA to be secure and tranquil are met, and no mixed content warning or blocking would be triggered, even though the reference to the JS file at OriginB still used the http scheme.

Now that OriginA has upgraded, OriginC can upgrade, too. OriginB still has a dependency to OriginD which is only http, so it can't yet go to https, but by turning on transitional mode, it has unblocked upgrades for A and B, without creating any negative user experiences for its own users due to mixed content from D. Without the transitional state, none of these sites could have upgraded.

The interesting bit here is the link from OriginA's HTML resource to OriginB's JS resource. It needs to satisfy all the properties of TLS - but whether it remains an http scheme that is transparently upgraded during the fetch from OriginB, or whether it is upgraded at OriginA before a fetch is even attempted depends on how this transitional state might be implemented.

How could we introduce this state?

A first take, just to break resource dependency deadlocks without introducing documents with mixed content into the world, would be to enable https with a server filter that returns a 404 whenever it would otherwise return a Content-Type of text/html. In fact, it's probably a surprisingly good approximation, modulo some edge cases involving CORS. Of the four properties we are interested in, this even gets us a pretty good take on three of them.

What is missing is the 4th property - the ability to detect the state of your own dependencies so you can know when it's OK to flip the "real https" switch. What if browsers could do Content-Security-Policy-Report-Only with "upgrade-insecure-requests" for http documents?

  1. attempt to upgrade
  2. if upgrade fails
    1. fall back to http fetch
    2. fire a report

Is it possible we don't need anything more than that? Basically this configuration, with Upgrade-Insecure-Requests implied for A, B, and C?

Enter the <iframe>

Unfortunately, no, just filtering HTML resources from being offered over standard https isn't enough; it fails with HTML to HTML dependencies from iframes. This is actually quite common for advertising.

What we need to unblock a dependency tree for iframes is also optimistic secure tranquility. That is, we must be able to attempt to optimistically load an HTML resource with a TLS upgrade, and enforce tranquility on it, but only from a secure, framed context. The key to why this is OK is that a non-upgraded document would already be broken when loaded in this way, so we aren't introducing any new breakage or mixed content experiences for users of OriginB.

https-transitional with ALPN and HTTP Alt-Svc

ALPN allows a client interacting with a server over TLS to indicate a different application layer protocol it wishes to use.

Let's imagine a new ALPN protocol type: "https-transitional". A server which sees a client request for that understands it as: "connect me to the resources and configuration served over http, but over TLS, not the https site, please". (The state of firewalls and other middleboxes effectively prevents the the introduction of a new port with any expectation of wide compatbility.) As described by the HTTP Alt-Svc draft, a full and successful TLS connection must be completed, and the server must present a certificate which matches the original http hostname. Unlike the ALt-Svc mechanism, resources fetched as transitional will behave slightly differently.

A resource fetched over "https-transitional" will have the following properties in the user agent:

Upgrade-Insecure-Requests would be modified in the following way:

Mixed Content blocking would also be modified. User agents do not currently automatically attempt to upgrade http -> https in contexts where mixed content is blocked because there is no explicit guarantee that the resources at both schemes are semantically equivalent. The "https-transitional" scheme does explicitly make the equivalency guarantee, so a user agent can always automatically attempt to upgrade any request which would be blocked as insecure mixed content to "https-transitional" without any hint from the server.

Servers could also advertise the availability of transitional mode with the HTTP Alt-Svc header. If a user agent has seen an Alt-Svc advertisement for an Origin that is still fresh, it can always make the upgrade, even for navigational requests, giving the benefits of opportunistic encryption against passive adversaries.

When a document has been loaded in transitional mode, the user agent should attempt to upgrade all resources which would have been blocked as mixed-content, as if upgrade-insecure-resources had been set, but must silently retry over http if the upgrade fails. It must report an error on the console in such cases and SHOULD provide a means for sites to request reports similar to (or re-using) the Content Security Policy mechanism. This will allow operators to understand, from genuine user traffic, whether all of their dependencies are upgradable, as part of a transition to full https.

The above figure represents the implications of these rules. Resources loaded from OriginA which iframe resources from OriginB will never have mixed content warnings because OriginB supports transitional mode. However, if resources from OriginB are loaded in iframes from documents that block mixed content, and depend on resources which cannot be upgraded, (such as the JS resource on OriginD) those fetches will silently be blocked to maintain the tranquility of the ancestor resources. This partial breakage is still preferrable to the current state where the initial load from OriginA would have been completely blocked. A resource loaded as a direct navigation from OriginB which depends on the same non-upgradable resource at OriginD would not have see the load blocked becuase it does not inherit a tranquility contract from a frame ancestor.

Performance Impacts

For https resources, this proposal shouldn't introduce any new latency compared to using Upgrade-Insecure-Requests. The ALPN negotiation makes it totally transparent. Browsers will be attempting TLS first, anyway, and it won't require any additional round trips to determine if https-transitional is supported.

The only place where there may be a performance impact is when a navigational request is upgraded due to the presence of an Alt-Svc header. Because the user agent will attempt to recursively upgrade all subresources of an upgraded document, which may not all be available upgraded or over https, this has the possibility of introducing significant new latency. User agents might reduce this by remembering the fact that TLS was unavailable for a given origin for some period of time, or they might attempt upgraded and non-upgraded fetches in parallel to be able to instantly substitute the http version on an aggressive timeout for secure connection establishment. This latter approach is bad for privacy in the presence of passive attackers. Some experimentation is probably necessary here to arrive at the best strategy. Peformance costs will be higher when transitional mode is new, and will consistently decline as it becomes more widely available.

Locking down

At some point, sites will still want to abandon plaintext http. A load from an https context will never fail down to plaintext or an unauthenticated connection, but optimistic upgrades are not, by themselves, enough to stop an active network attacker from forcing a downgrade back to an unecrypted connection for navigations or top-of-window resources loaded transitionally. During the transitional period, it's an explicit goal to never break if the upgrade can't succeed. How do we get from transitional to finally secure, especially if the existence of transitional is a means to avoid sites having to change the scheme of every http link out there? Related to this, we also eventually want to be able to enforce tranquility on upgraded resources to stop those same attackers.

We can start with some design patterns from HTTP Strict Transport Security, which was invented to solve this problem for HTTPS sites that wanted to fully deprecate HTTP, as well as the Alt-Svc header mentioned earlier.

The first thing a site could do is set the freshness of its Alt-Svc advertisement to "infinite". This would be a signal to user agents which understand it to never attempt a plaintext connection to that Origin again - only use https-transitional and https, forever. User agents might, as they do today, allow sites to opt-in to a preload of this setting. After doing this, a service could continue to offer service over plaintext for legacy clients, or it could turn it off if it determines that traffic is acceptably low.

The next thing is that content served with transitional mode needs a way to opt-in to tranquility. This is also probably best handled by an HTTP header. Setting the "tranquil" bit would imply at least that the document's setting object should restrict mixed content. It might even allow such a resource to be eligible to be same-origin with https content, though that requires additional analysis, and probably would be a different flag, so sites can make appropriate transition of cookies, local storage and other Origin scoped state on the client. A tranquil resource might also be eligible to get the "lock" or whatever other secure UI treatement is applied to https.

Eventually, if penetration of transitional mode is sufficient, user agents could start dropping support for http. Perhaps first as a user preference flag, then with a compatibility list enabling a few laggard sites to still use plaintext, then with a big, interstitial warning. As flag days go, this would be much less painful than the current possibilities, because the process of readying your site for the transition could be almost entirely automated, with little or no risk of breakage. Many sites would probably stay at the transitional state indefinitely, but users would still be getting all the benefits they expect from TLS today.