4 Feb 2017

How to Have Great Code Quality with Minimum Overhead

We all want great code quality. But many programming practices that allegedly improve code quality also add complexity and overhead. I've worked on codebases where each class does nothing but invoke the next class. Such codebases are hard to understand, modify, refactor, or debug. They are a liability.

So how do you have great code quality with minimum overhead?


YAGNI

Solve problems that exist now. Don’t try to foresee problems that may exist at some unspecified point in the future. You Aren't Going to Need It.

When I tried to foresee requirements that would come along, I'd usually get it wrong, and the requirements that did come along would be different from what I'd guessed. The supposedly extensible code I wrote had to be refactored again. The extensibility didn't help. In fact, it hurt, because refactoring more code is harder.

Martin Fowler lists four costs of building something before it's needed:  If you take a week to build something that's not yet needed, you could have instead spent that week building something useful right now. That's the first cost. If you're working on a feature and decide to make it more extensible for another feature you try to foresee, the first feature gets delayed, so there's the cost your users pay in not having that feature a week early.  That's the second cost. The code is more complex, which makes it harder to understand, modify or debug, which is the third cost. When you eventually realise that you didn't build it the right way (because you built it before you got all the information and context needed), you have to repair what you built, which is the fourth cost.

Whenever you're about to apply some "good" object-oriented principle, ask yourself if you'll derive any concrete benefit from it today. If so, go ahead and implement it. If not, ask yourself a second question: will it be just as easy to implement later as it is now? If so, put it off for later.

But the first time something does become a problem, solve it immediately. If I find myself being confused about what a class does, even if for a second, I immediately rename or refactor it. If you notice a problem and put off fixing it, you're paying the price again and again, and ending up with worse code quality.

When you do find a bug, try to find the underlying design that made the bug possible, and fix it. For example, suppose you have a field whose value should only keep increasing. But you find that it sometimes decreases. After fixing the problem, see if, instead of a setter, you can have an increaseFoo() method that takes in an unsigned int. Or keep the setter but add an assertion that the new value > old value.

Don't make choices that will make refactoring harder, because that amounts to locking yourself into the suboptimal design you have today. If you put off refactoring until you need it (as I suggested above), you also need the flexibility to be able to do so whenever needed. Examples of this rule are not having global mutable state: if every class can modify the state of every other class, refactoring becomes harder. Or classes that run into a double-digit number of methods (excluding private ones), or more than a few hundred lines of code. Disentangling such classes is going to be hard. Don't get into situations that are hard to get out of.


Ignore the object-oriented bureaucrats

There are some people, especially in the Java world, who swear by the rulebook, telling you to always do something or never do something else, but can't tell you how, if at all, the rule you helps in this particular case. They'll tell you never to use singletons, or to unit test everything, or to inject all dependencies. One person even told me that whenever I define a class, I should define a factory for it, and that it's bad practice to just use the new operator. These are the people who, instead of writing:

2 + 3

... will write:

ArithmeticOperationsFactory
    .newAdder()
    .addOperand(new PrimitiveOperand(2))
    .addOperand(new PrimitiveOperand(3))
    .execute();

For some reason, Java attracts a lot of these bureaucrats. I worked on some over-engineered Java codebases where no class does anything but invoke another class.

Pay these bureaucrats no heed.

Which isn't to say that you shouldn't learn what the design patterns are. You should — so that you can use them when appropriate. Just today, I was trying to learn what view models and coordinators are, and how I could use them in my iOS app. But, once you learn them, you should use them only if you have a concrete problem today that the design pattern addresses.

As another example, singletons rile up many Java programmers, but when used appropriately, singletons are great. Which is why they exist in the first place. Apple, for example, widely uses singletons in the iOS API, which makes the API much simpler. Weigh the pros and cons, use your brain, and decide what makes sense in your context. As David Heinemeier Hansson puts it eloquently:
You look at the code you had before applying the pattern and after it. Did it make the code simpler to understand? Clearer? Prettier? If you can’t easily see whether that’s so, you’ve been [fooled].
Also keep in mind that complexity begets complexity. For example, people tell you not to use singletons because it makes testing harder. That applies only if you're actually going to write tests. Don't start by assuming you need automated tests. If you do, in addition to the overhead of writing the tests themselves, you'll pay the overhead of being forced into a design more complex than the one you may otherwise use.

As another example, there's an over-designed dependency-injection framework for Java called Guice. When you use it, compile-time errors become runtime errors. It's hard to read the code and understand what it does, because you can no longer ask Eclipse to show you all the callers of a particular function. I had to modify a function to throw an exception, run it, and see what the call tree was. Which didn't help, either, since the stack trace was 50 levels deep, a good part of which was Guice internal methods calling each other. I also had issues with the order of initialisation. Guice turned out to be so slow on Android, that a new library, Dagger, had to be built. Dagger addresses the subset of Guice's functionality that is most often used.

Guice is used a lot at Google, but there are also many jokes about it. One depicts a burning house with the caption: "Guice: turning compile-time errors into runtime errors since 2009".

Another is based on the well-known quote:
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
This one goes: "You have a problem. You decide to use Guice. How you have a Provider<Problem>."

The point isn't that you should never use dependency injection, or that you should never use Guice. Rather, you should think about whether you'll benefit from dependency injection, since it has real costs. If you do decide to use DI, you shouldn't automatically use an over-designed library like Guice without asking if a simpler library like Dagger does what you need.


What are some things I don't apply YAGNI to?

First, naming: I spend a few seconds trying to find the right name for every class and method. The wrong name make it confusing and unclear every time you use it. Get the names right, and half the battle is already won. That may be an exaggeration, but I hope the point is clear. (As the saying goes: There are only two hard things in Computer Science: cache invalidation and naming things.)

Second, DRY: Don't repeat yourself. When I find myself saying the same thing twice, no matter how small, I refactor. For example, named constants. I'm not religious about them, and don't mind an inline constant like:

button.borderWidth = 2;

This is actually smaller and more readable than:

static final int BUTTON_BORDER_WIDTH = 2;

button.borderWidth = BUTTON_BORDER_WIDTH;

So, I don't use a named constant. But if I create multiple buttons with the same border width, I don't repeat myself:

button2.borderWidth = 2
button3.borderWidth = 2
button4.borderWidth = 2

I create a named constant for it.

A constant that's used only once isn't giving you much benefit, only adding overhead.

A lot of other object-oriented practices boil down to DRY, anyway. So, instead of saying, "we should XYZ so that we don't repeat code in the future", fix it when you do find yourself writing the same thing the second time. If you keep DRY in mind, many of the other object-oriented practices take care of themselves.

Third, state: State is the values of your variables and data structures. I pay a lot of attention to state, because if you get it wrong, your users will notice. The state of the app — the data in it — is the whole reason they're using your app. Unlike the typical object-oriented nonsense: your users aren't going to notice if you don't use a factory.

I do everything I can to ensure that the state is right. I minimise mutability, declaring everything with let rather than var in Swift (or final in Java). I never have global mutable variables. I never let classes reach into every other class and modify its state.

Again, some common object-oriented practices often don't help, and are in fact bureaucratic, like prohibiting public variables. Imagine you wrote:

public int age;

And someone told you that variables shouldn't be public, asking you to change it to:

private int age;

public int age() {
  return age;
}

public void setAge(int age) {
  this.age = age;
}

You may be told this is better encapsulated, but it's actually not. The point of encapsulation is to limit what other classes can do, and that's not being achieved here: any other class can still set the age value to whatever it wants. You're not actually limiting what values the variable can have, or how it's allowed to change, or when. You're only changing the syntax with which someone sets the variable, which is no benefit at all.

All this does is adding bureaucracy. In this scenario, I would use public variables (assuming this isn't an API). It can be refactored just as well when needed, using Eclipse’s Encapsulate Field. Or, in a small project, just changing public to private, building it, and fixing compiler-errors for 30 seconds. But the first time I do something that’s best done in a setter, like validation, or computing on demand instead of storing it, I encapsulate it. That way, I don’t have boilerplate code in my project that isn’t adding value, but at the same time, I don’t skip having an accessor when it actually helps.

I encapsulate state in classes to the extent possible, limiting how other classes can modify them. This lets me restrict the set of operations that can be performed on a piece of data to the ones that should actually be performed.

For example, I'm building a camera app, which supports taking a burst of photos, each of which is represented by a Photo. I could use a list of Photos to represent the burst, but I instead defined a Burst class that has a private list of Photo. That way, I can restrict what operations can be done on the burst. For example, a new photo should be appended, not inserted at the beginning or middle. You can't delete photos from a burst, or reorder them. Once you finish taking a burst of photos, you can't take a photo later and add it to the burst. To correctly model these constraints, I have only a public addPhoto() method, which appends the given photo to the internal list. Clients have no way to add photos at the wrong place, or to reorder them. And I have a method seal() that's called on the burst as soon as the last photo in the burst has been taken and added to the burst. seal() sets a flag on the burst and, if addPhoto() is called afterward, it asserts.

Be wary of threading, and explicit synchronisation. That's a recipe for bugs. As far as possible, do everything in the main thread, using async APIs.

Don’t risk problems that require hours of debugging to figure out, like race conditions or global mutable state. The kind of situation that’s fine to take on is where it’s obvious, like non-private fields. You don’t need to debug for hours to learn that a class's fields aren’t private.

The Law of Demeter is another good code smell: classes should talk to their friends, not strangers. Put differently, don't have multiple dots in an expression, like a.b.camera.takePhoto(). When you see this code, ask yourself: Does a need to expose the entire object b, or can it expose or forward just one function? Does b need to expose the entire camera, or can it expose only takePhoto()?

Fourth, assertions: This is technically part of state, but important enough to call out separately. I add every possible assertion about my data. Functions I write often begin by asserting their arguments and the state of the object. Since assertions are disabled in release builds, I can add them anywhere and everywhere without worrying that I'm adding an invalid assertion that might cause the app to crash in users' hands. If in doubt, just add the assertion, and if it fires, remove it (and as a positive side-effect, you'll end up learning something about your program).

If you create a high barrier to adding assertions, you tend to add fewer, which isn't good. I almost never use preconditions, which are enabled in release builds, for exactly this reason. If I have to stop and think each time, "Should this be an assertion or precondition?", that adds too much overhead, and I'll end up writing fewer checks. And it reduces the speed with which I can deliver features. You want maximum productivity, great code quality, and minimum overhead.

Assertions are better than documentation, which isn't enforced, and is an overhead for you to read and follow. And in implementations of methods, you can't be sure that the caller kept up his end of the bargain.

Assertions are better than automated tests, because tests test only the cases you thought of, which are the ones less likely to have bugs. Besides, assertions run all the time, while you may run a test only once in a while. I've found assertions to be exercised an order of magnitude more than tests, and trigger in situations I'd have never thought to write a test for. Tests are applicable when you want to verify sequences — do a, then b, then c, and finally assert that the result is something.

Fifth, testing: There's a continuum of views from "testing destroys my soul, so I never write tests" to "test every class, even if it does nothing but invoke another class." I once had a system where a class A invoked B, which invoked C, which invoked D, which was the only class in the system that did anything useful. But my code reviewer insisted I test A while mocking out B, then test B while mocking out C, then test C while mocking out D. This was a bureaucratic exercise that took up time disproportionate to its benefit. As I worked on the system for a couple of quarters, these tests rarely fired, so the cost-benefit ratio was poor.

I strike a balance between "never test" and "always test". I test when I feel it would help.

Imagine you write a class that's particularly algorithmic in nature, and you find yourself wondering, "Does the class I just coded up work?" or "I'm sure there are some subtle bugs, but I don't know what." In that situation, testing is a good idea. It give you confidence that the code works, both now and as you refactor it in the future, without bothering to test manually each time, which can take up more time than coding up some automated tests once.

For one of my projects, an object database I built in college, I had as much test code as real code, and that gave me great confidence about the code. I could say confidently that my code works, which the other teams didn't seem able to say.

So, testing is great in some situations.

But not in others.

Mocking is usually not worth the effort. If I find myself writing a mock, I consider it a sign that I'm overdoing the testing.

Test at the highest level possible. If you have a class A that does some work and invokes B, which does some work and invokes C, see if you test A without mocking out B and C. This is sometimes referred to as an integration test or end-to-end test. As opposed to a unit test, which tests only one class, by mocking out its dependencies. In this example, if you wanted to write unit tests, you'd write an A test that tests A while mocking B out. And a B test that tests B while mocking out C. And a C test.

End-to-end tests are more useful, since you're testing three classes at once, and their interactions. Many bugs lie at the interactions between classes, so unit testing can't catch them. Besides, you're writing one end-to-end test as opposed to three unit tests.

I worked on Google Sheets, specifically on printing spreadsheets. This is the code that ran when you selected File > Print, and generated a PDF file to print. This code had a dozen or two classes, and I didn't bother to unit test most classes. Instead, I created two dozen test spreadsheets trying to trigger all edge cases in my code. Some test spreadsheets had merged cells. Some had many tabs in the spreadsheet, as many as Google Sheets allows. One spreadsheet had a million rows. Some had RTL text. Some had charts. Some had images. Some had hidden rows. And so on. I wrote an automated test for each of these test sheets, which would try to print that spreadsheet, and assert something about the resulting PDF. For example, print a sheet with a chart, and assert that the output PDF contains an image. Or print a sheet with a hidden row containing some text, and assert that the PDF does not contain that text.

If something can't be tested by an end-to-end test, or I find myself worrying that that class might have hidden bugs, then I first ask myself if I can come up with an end-to-end test for it. In the above project, I suspected that my code that handles merged cells has an edge case. So I created a test spreadsheet in which all the cells were merged into one, with some text in it, and added an end-to-end test that tried to print the spreadsheet and asserted that the PDF had that text in it. When I ran the test, the code threw an exception, so I fixed the bug. It no longer threw an exception, but it generated a PDF with a blank cell, rather than the text that was supposed to be in the merged cell. I again fixed the bug. Only when I couldn't write an end-to-end test to catch something did I write a unit test.

No comments:

Post a Comment