25 Nov 2014

It's Time We Move to Secure Programming Languages

The NSA scandal reminded us that all software has serious security holes in them. Whether it’s PC OSs like Windows, OS X or Linux, mobile OSs like Android or iOS, apps (whether mobile or PC), web apps, modems and routers at home, core routers used by ISPs, or other types of devices, all software and firmware has glaring security holes. And if it doesn’t, right now, the next version most probably will.

As another example, most versions of iOS are jailbroken in due course of time. And if you can jailbreak your phone, it means there’s a security hole that others can and will exploit, as well.

This is a scary state of affairs. It has many reasons, but one of them is the insecure languages we use a lot of the time. It’s the shaky foundation we’ve built the Internet and all our computing infrastructure and systems on. For example, buffer overflow errors are still being found that result in zero-day attacks. If something has been happening regularly, you can be sure it will continue to happen if we keep doing the same thing. As Einstein put it, insanity is doing the same thing over and over again and expecting different results.

We need to switch course. I propose that most software [1] be built in languages that have certain security properties:

To start with, we need memory safety, which means not having C-like pointers. This is the foundation for a lot of other security properties. Without memory safety, you can’t really have many other kinds of security.

Memory safety implies a number of other decisions about the language. For example, garbage collection [2] and compulsory bounds-checking of array accesses. Arrays, objects and local variables must be compulsorily initialised before they can be used, or they should be implicitly zero-initialised by the runtime, to avoid leaking information from dead objects.

All these must be ironclad guarantees, unlike in C++ which has a lot of undefined behavior, which means that all bets are off if you do certain things. That won’t do here. An bad array index must result in an exception no matter what. A freshly allocated array must be zero-initialised even in the presence of race conditions.

The language should also be strongly typed, so that, for example, you can’t invoke a method with the wrong number or type of arguments. Whether this checking is done at compile-time (static typing) or run-time (dynamic typing) is fine, but it needs to be done, with no way to bypass it.

Then, you need sandboxing. If you’re using a third-party library, it doesn’t mean you grant it all the permissions you have. If, for example, you’re using an XML parser, it shouldn’t be able to access the disk, but that’s exactly what happens in some attacks.

So, you want to be able to use a library while granting it only certain permissions. This is a new way of thinking. It consists of dividing up an app into libraries, either first-party or third-party. You can even think of a portion of your app that does a defined task as a library, and sandbox it. Or you can split your app into multiple portions, each of which has the minimal set of permissions needed for it to do its job. For example, in a notes app, you might have a network part, a disk or database part, and a UI part. This means that a vulnerability in the UI code, for example, can’t read or write sensitive files.

This needs to be a way of thinking about programming. For this paradigm to be successful, you should be able to create thousands of sandboxes within your app, with negligible overhead. And you should be able to nest sandboxes. This is needed in the case where your library invokes a lower-level library in a sandbox, but your library is itself sandboxed. Obviously, code running in the inner sandbox then has the intersection of permissions specified in each of the sandboxes.

The language should also give you a flexible way to specify what permissions a sandbox has. This would be a callback interface that you implement. Any method in the callback that you don’t implement would default to “deny”. This would be far more flexible than reading a config file, or other specific mechanism.

But what methods are protected by the sandbox? This should also be as flexible as possible. Rather than having a hardcoded list of permissions, it should be possible for a sandbox to intercept any method call. This fixes issues such as not able to whitelist something at the right granularity, or, in the other direction, ending up with a complex and unwieldy list of permissions. Or forgetting something when defining the list of permissions, in which case you can’t go back and fix it without breaking backward compatibility.

A language also needs a way to mark untrusted data as being tainted, to make attacks like XSS or SQL injection harder.

Arithmetic overflow and underflow should throw exceptions rather than being vectors for attacks.

Then, the trusted computing base should be as small as possible. This includes the runtime or VM of a language. So, for example, as much of the standard library as possible should run with the permissions of the app, and not in a special, privileged mode. To give a Java example, there’s a difference between FileInputStream and ArrayList. The FileInputStream class needs to be able to read files and to check whether the app has the permission to do so. A bug in some of this code may result in permitting access to files that the application doesn’t have access to. But a bug in ArrayList shouldn’t result in unauthorised file access, no matter how carelessly written ArrayList might be. In other words, though both FileInputStream and ArrayList are part of the standard library, FileInputStream is part of the trusted computing base, while ArrayList shouldn’t be. To generalise, most of the standard library should merely be treated as part of the application, and not as part of the trusted computing base.

This applies at the kernel layer as well. As much of the kernel as possible should be split into processes [3], each of which has the minimum permissions needed for its job. So, for example, if there’s a bug in the TCP/IP stack, it shouldn’t result in unauthorised file access.

If most programs are built using languages with these or similar security properties, we’ll all be much safer. Let’s stop building on a creaky foundation, and inviting the NSA, or other hackers, to hack us.

[1] With specific exceptions like codecs, where performance trumps everything else. And even these exceptions should be sandboxed, using something like Native Client or XPC.

[2] How else can you prevent errors like double-free, or freeing using the wrong pointer type, or using delete instead of delete[] for arrays?

[3] This, of course, is the argument for microkernels.

[4] There are also other ideas, such as not running native apps at all (Chrome OS). Or encrypting all TCP connections. Or having your OS sandbox apps, by having the apps define a list of permissions, and letting the user check the list.

No comments:

Post a Comment