Starting a new job is hard. My first few weeks at
Palantir were no different. There was training…
of a sort. Things really got interesting, though, when I was given my
first real piece of work: take a humongous, incomprehensible script,
clean it up and make it work. I could have dived straight in and started
to rewrite it, adding features as I went, but I didn’t. I want to
explain my reasons.
First things first: there were no tests. None. Zip. More than that, the
paradigm used made it very hard to simply wrap tests around the script.
It needed some love first. I’ve talked about this a lot before, but I
want to reiterate it: legacy code is everywhere, and most of it is not
just untested, but untestable, and there’s no way you or I, as a
developer new to the code, understands it well enough to just start
tweaking. The answer is fairly simple though: we can’t be sure of the
exact nuances of the code, but we can figure out roughly what the
script (or application, or component) does. With a bit of inspection,
we should be able to figure out what the inputs are, and then plug stuff
in and determine the corresponding outputs.
And then we introduce the golden master.
I’ve probably explained this before, but I’m going to do it again. A
golden master generally consists of a bunch of inputs and a bunch of
expected outputs. The inputs can be randomly generated—in fact, if it’s
possible, you should definitely be doing this randomly. In my case, this
meant a random sample of approximately 1000 items—a tiny portion of the
real data, but enough that I could see when I’d regressed. I then cached
the output of the script—a ton of XML files—when these 1000 numbers were
plugged in. Because of the sensitivity of the data, the only thing
stored in version control was those 1000 IDs. Everything else was local
to my machine or in the database.
Next, I wrote an
ant build script that ran
this over and over again, and produced a diff (using
diff) between the
expected output and the actual output. If there was any difference, it
failed the build. This isn’t perfect, as changes can be intentional as
well as unintentional, but it’s enough to ensure that things are
working. If I expected to see the change, I simply overwrote the golden
master with the new files.
Total test running time: three minutes. Down from who knows how long I’d
spend manually checking that I didn’t break anything? I could do better
(and did, later), but this was enough for now.
Only then did I start reading the code in detail. The reason for this is
I get fairly critical when I read code. I can’t help cleaning it up as I
go. This provides an opportunity to learn much more about it—if I can
watch the tests break every time I think I’ve understood things, I’ll
learn much faster. I’m refactoring only at this stage, not adding
features, so there should be no changes. That said, there were a few—I
accidentally fixed a few bugs by simplifying the code. Situations like
that called for extra-special care when reading the diffs to ensure the
change was actually good. Quite often, I’d revert and intentionally
make the same change without any of the other refactoring so I could be
sure it made sense.
Total number of commits: over 40. And I’ve delivered practically
I kept at this, moving things around, changing the code to be more
functional when it made things cleaner, injecting rather than
constructing, making things more deterministic, extracting methods and
classes… you get the idea. Total time taken: about a week.
Yup. A week.
That’s a long time. Granted, part of it was me learning my way, but most
of it was getting the code under some degree of test and cleaning it up.
Was it worth it? Definitely. I spent the next week cranking out
features. It was easy, because instead of diving straight in, I’d spent
the time to get everything in order. The golden master test went from
three minutes to 30 seconds and stopped hitting a server as part of
this, making it even easier to ensure I hadn’t broken anything.
That original file is now half the size and does so, so much more, much
much faster than when I started. I’m pretty happy with it.