This is a somewhat expanded version the talk I’m giving to the Vancouver Software Developers’ Network tomorrow. I don’t think there’s is much that I haven’t said in previous blog postings, but I wanted to gather it all in one place.
“What is the variability between programmers?” is question I was curious about when I started my MS CS at UBC. In Silicon Valley, I’d heard a rule of thumb that there was a 100-to-1 difference in programmer productivity. My husband had heard ten-to-one; Joel on Software quotes one of his old profs to suggest that it is ten-to-one or larger.
Here is a drawing (deliberately crude, so nobody would think it represents actual data) of what I thought the histogram would look like of the number of programmers that finish a task in a certain time on the y-axis, and the time it takes on the x-axis.
The first problem here is that this just measures time, not quality. It is presumably faster to write lousy code than to write code that is clean, easy to read, easy to maintain, etc. That is a legitimate issue, but unfortunately my reading of peer-reviewed papers has not convinced me that anybody really knows how to measure quality.
There are some people who have measured coding speed, however. I have reported previously on experiments by Demarco and Lister, Dickey, Sachman, Curtis, and Ko which measure the time for a number of programmers to do a task. What I found is that the time histogram curve actually looks like this:
- The worst programmer isn’t 100 or 10 times slower than the best, the worst programmer — found at the end of a very long tail — is infinitely worse. If you think about it, I am going to be infinitely faser than a walrus. It is hard to program with those flippers. (I actually worked on a project with a contractor who, after a year, was let go because despite repeated requests, he had not checked in a single line of his code. I think that counts as infinitely slow.)
- The median programmer is about two to four times slower than the fastest on single tasks. (Because of regression to the mean, this advantage should get smaller with many tasks.)
- The curve is wickedly shifted to the left. This makes sense: there isn’t much you can do to get faster, but a LOT of things you can do to get slower. (Not ever check in your code, for example.)
What implications does this curve have?
- Don’t spend a lot of effort to hiring the absolute best; spend lots of effort to avoid hiring losers.
- Don’t spend a lot of effort to learning how to type faster; spend lots of effort to figure out how to avoid getting stuck.
“Don’t get stuck” is easier said than done, of course, but there are things you can do.
- When you have a question — e.g. “Why is Foo set to 3 instead of 5?” — write down three hypotheses for what the answer might be. This can help you avoid confirmation bias. I came up with this idea after reading that breadth-first-ish approaches to problems are more successful than more depth-first-ish searches. I don’t have any research on it, but writing three hypotheses helps me a lot.
- Explain your question/problem to someone. It doesn’t even need to be someone who knows anything about coding. While there is little academic research on verbalizing, there are lots of anecdotes on it being helpful to verbalize. Anecdotes say that “rubber ducking” is useful, and that has been my experience as well. Verbalizing might also be part of why I find writing down three hypotheses so useful.
- Ask for help! Someone familiar with the particular area that you are investigating might be harder to find than a rubber duck, but sometimes can be more useful.
- Note to managers: your new hires will probably feel uncomfortable interrupting you to ask for help. Instead of making them come interrupt you, go to them. Once or twice per day, stop by their office and spend some time with them. Ask to see what they are doing; pair program with them for a little while. When they are new, you are more likely to catch them being stuck than not being stuck, so you can proactively un-stick them. Even if they are not stuck, you can still probably give good pointers on tools and techniques.
- Use tools to help you find the answers to your questions! There are all kinds of great tools available now that can help you answer questions.
- Omniscient debuggers: Debuggers like odb and undoDB keep track of every variable’s state change and then let you trace backwards to where that variable changed state. (Note: Cisco also made an Eclipse plugin for omniscient debugging in C++, but for internal use only.)
- Many code coverage tool will also color lines based on whether they were executed or not. This is a cheap way to see which execution paths were taken! Examples include Visual Studio, the Intel C++ Code Coverage Tool, and the Eclipse plug-in EclEmma.
- One of the questions that I frequently ask is, “How do I get information from class Foo over to class Bar?” Prospector and Strathcona can help with that. Strathcona looks for examples of existing code in your code base that gets you from Foo to Bar; Prospector looks for existing code, and also traverses the tree of classes that can be reached from a given class to answer that question.
- Use tools to keep you from having to get stuck in the first place.
- Findbugs looks for code that “looks funny” and which is likely to have errors in it.
- JML allows the user to specify all kinds of “contracts” about how a method will work — preconditions, post-conditions, invariants, etc — in a very rich way. If anybody breaks those contracts (e.g. by passing illegal arguments), it gets flagged. It sounds like it would be really tedious to generate all those promises, but the tool Daikon can help. Daikon can generate promises based on actual run data; if something changes to violate the promises, it will flag it. (The contracts also work as extra documentation.)