15 Aug 2008

Scalable Language Design

Let's say one wanted to design a powerful language that would be widely used, from servers to the desktop to mobiles. How would one go about this? The exact set of abstractions doesn't matter -- most programmers are stuck with Java or C++, especially on the client, so any decent language will be a step up for them. So if one wanted to design a language for wide use from servers to the desktop to mobiles, how would one go about it?

First, don't insist on a VM. VMs may be useful in some circumstances, but there's no reason all execution must happen in a VM. Moreover, VMs don't work well on resource-constrained devices like mobile phones. Also there's a versioning dependency on the VM which can change incompatibly from release to release, breaking software. And the huge size of VMs has its cost as well. So any language that aims for wide use should compile to native code. If you come up with a first-rate VM, great. But don't insist everyone use it. Support both native and virtualized execution.

Second, the language should be open-source. And I mean, the canonical implementation should be open-source. In 2008, I doubt there's room for a not fully open language that wants to be widely used.

Finally, by default prioritize power, or programmer productivity. As Paul Graham puts it, "In language design, we should be consciously seeking out situations where we can trade efficiency for even the smallest increase in convenience." But if you just do that you'll end up with a language that's unacceptably slow for many applications. Would you like your web browser written purely in Ruby? I wouldn't. The solution is not to remove powerful but slow language featuers, resulting in a crippled language -- that's a false dichotomy. The 80/20 rule shows us the way -- the customary mechanisms of the language must prioritize power over speed, but you should have more optimized and less convenient mechanisms available for the hotspots. So by default, the language should be dynamically typed, but if you want you should be able to declare either the interface of a type (as in a Java interface or an Objective-C protocol) or the implementation type (subclass of Foo) or even the exact class. By default everything is passed by reference (and basic types are objects) but as an optimization you'll be able to declare that for certain types, object identity doesn't have to be preserved, in which case the compiler can do a pass-by-value optimization. Objects are just syntactic sugar for closures, but if you use the standard object mechanism, the compiler may be able to eliminate the dynamic binding and inline the call. Array lookups are by default bounds-checked, but there will also be an unchecked access mechanism that you can use in critical inner loops. By default, objects are garbage-collected and memory integrity is guaranteed, but you can use a different memory pool with manual release, and pointers to muck around with memory. Arrays are resizable (and dynamically typed), but the language also has a fixed-size fixed-type array. And so forth.

I disagree with the idea that it's not possible to design a language that's a joy to program in and is simultaneously fast enough.

No comments:

Post a Comment