Review: Some basic determinants of computer programming productivity

Posted in programmer productivity, review at 4:08 pm by ducky

Some basic determinants of computer programming productivity by Earl Chrysler (1978) measured the (self-reported) time it took professional programmers to complete COBOL code in the course of their work and looked for correlations.

He correlated the time it took with characteristics of the program and, not surprisingly, found a bunch of things that correlated with the number of hours that it took: the number of files, number of records, number of fields, number of output files, number of output records, number of output fields, mathematical operations, control breaks, and the number of output fields without corresponding input fields. This is not surprising, but it is rather handy to have had someone do this.

He then looked at various features of the programmers themselves to see what correlated. He found that experience, experience at that company, experience in programming business applications, experience with COBOL, years of education, and age all correlated, with age correlating the most strongly. (The older, the faster.) This was surprising. I’ve seen a number of other academic studies that seemed to show no effect of age or experience. The Vessey paper and the Schenk paper (which I will blog about someday, really!), for example, have some “experts” with very little experience and some “novices” with lots of experience.

The academic studies, however, tend to have a bunch of people getting timed doing the same small task. Maybe the people who are fastest in small tasks are the ones who don’t spend much time trying to understand the code — which might work for small tasks but be a less successful strategy in a long-term work environment.

Or the paper is just messed up. Or all the other papers are messed up.

Gotta love research.

Update: Turley and Bieman in Competencies of Exceptional and Non-Exceptional Software Engineers (1993) also say that experience is the only biographical predictor of performance. (“Exceptional engineers are more likely than non-exceptional engineers to maintain a ‘big picture’, have a bias for action, be driven by a sense of mission, exhibit and articulate strong convictions, play a pro-active role with management, and help other engineers.”)

Update update: Wolverton in The cost of developing large-scale software found that the experience of the programmer didn’t predict the routine unit cost. It looks like he didn’t control for how complex the code written was.


Review: Vessey's Expertise in debugging computer programs: a process analysis

Posted in programmer productivity, review at 4:07 pm by ducky

I really wanted to like Expertise in Debugging Computer Programs: a Process Analysis by Iris Vessey (Systems, Man and Cybernetics, IEEE Transactions on, Vol. 16, No. 5. (1986), pp. 621-637.) Van Mayrhauser and Vans said in Program Comprehension During Software Maintenance and Evolution indicated that Vessey had shown that experts use a breadth-first approach more than novices.

I was a little puzzled, though, because Schenk et al, in Differences between novice and expert systems analysts: what do we know and what do we do? cited the exact same paper, but said that novices use a breadth-first approach more than experts! Clearly, they both couldn’t be right, but I was putting my money on Schenk just being wrong.

It also looked like the Vessey paper was going to be a good paper just in general, that it would tell me lots of good things about programmer productivity.

Well, it turns out that the Vessey paper is flawed. She studied sixteen programmers, and asked them debug a relatively simple program in the common language of the time — COBOL. (The paper was written in 1985.) It is not at all clear how they interacted with the code; I have a hunch that they were handed printouts of the code. It also isn’t completely clear what the task termination looked like, but the paper says at one point “when told they were not correct”, so it sounds like they said to Vessey “Okay, the problem is here” and Vessey either stopped the study or told them “no, keep trying”.

Vessey classified the programmers as expert and novice based on the number of times they did certain things while working on the task. (Side note: in the literature that I’ve been reading, “novice” and “expert” have nothing to do with how much experience the subjects have. They are euphemisms for “unskilled” and “skilled” or perhaps “bad” and “good”.)

She didn’t normalize by how long they spent to finish the task, she just looked at the count of how many times they did X. There were three different measures for X: how often did the subject switch what high-level debugging task they were doing (e.g. “formulate hypothesis” or “amend error”); start over; change where in the program they were looking.

She then noted that the experts finished much faster than the novices. Um. She didn’t seem to notice that the that the count of the number of times that you do X during the course of a task is going to correlate strongly with how much time you spend on the task. So basically, I think she found that people who finish faster, finish faster.

She also noted that the expert/novice classification was a perfect predictor of whether the subjects made any errors or not. Um, the time they took to finish was strongly correlated with whether they made any errors or not. If the made an error, they had to try again.

Vessey said that 15/16 experts could be classified by a combination of two factors: whether they used a breadth-first search (BFS) for a solution or a depth-first search (DFS) whether they used systems thinking or not However, you don’t need both of the tests; just the systems-thinking test accurately predicts 15/16. All eight of the experts always used BFS and systems-thinking, but half of the novices also used BFS, while only one of the novices used systems-thinking.

Unfortunately, Vessey didn’t do a particularly good job of explaining what she meant by “system thinking” or how she measured it.

Vessey also cited literature that indicated that the amount of knowledge in programmer’s long-term memory affected how well they could debug. In particular, she said that the chunking ability was important. (Chunking is a way to increase human memory capacity by re-encoding the data to match structures that are already in long-term memory, so that you merely have to store a “pointer” to the representation of the aggregate item in memory, instead of needing to remember a bunch of individual things. For example, if I ask you to remember the letters C, A, T, A, S, T, R, O, P, H, and E, you will probably just make a “pointer” to the word “catastrophe” in your mind. If, on the other hand, I ask you to remember the letters S, I, T, O, W, A, J, C, L, B, and M, that will probably be much more difficult fo you.)

Vessey says that higher chunking ability will manifest itself in smoother debugging, which she then says will be shown by the “count of X” measures as described above, but doesn’t justify that assertion. She frequently conflates “chunking ability” with the count of X, as if she had fully proven it. I don’t think she did, so her conclusions about chunking ability are off-base.

One thing that the paper notes is that in general, the novices tended to be more rigid and inflexible about their hypotheses. If they came up with a hypothesis, they stuck with it for too long. (There were also two novices who didn’t generate hypotheses, and basically just kept trying things somewhat at random.) This is consistent with what I’ve seen in other papers.


Breadth-first search

Posted in programmer productivity at 4:38 pm by ducky

As I mentioned before, I saw two coders working on the same task using the same navigational style take very very different lengths of time. While tracing forward through code, one happened to pick the correct choice from two equally plausible choices; one happened to pick the wrong one and get off into the weeds.

This makes me even more convinced that the important thing when coding is to not get stuck. It also said to me that a more breadth-first search (BFS) strategy (where you explore each path for a little ways before examining any path in depth) was a better idea than depth-first search (DFS). This got me thinking about why my pal didn’t do BFS, why he spent 30 minutes off in the weeds.

Eclipse is really bad at supporting BFS. Some of the things it does are generically bad, but some are badnesses all of its own.

  • The tabs jump around a lot. Eclipse currently has a strange way of dealing with its tabs. There is a way to get tabbing behavior where it doesn’t jump around a lot — more like Firefox, but it is not the default. (Window->Preferences->General->Editors-> put a check in “Close editors automatically:”, set “Number of opened editors before closing” to 1, and select the radio button “Open new editor”. Now there will be a little “pin” icon to the right of the forward-history button that you can use to say “Don’t close this pane”.)
  • Eclipse doesn’t have a per-tab/page history like Firefox does. All your history gets jumbled together. This means that you can’t use tabs/pages to help you keep track of individual tasks.
  • It’s difficult to mark points for future reference. There are bookmarks in Eclipse, but most people don’t know about them. Even if you are one of the few and proud who knows about bookmarks, they are hard to use. It would be nice if in the history (which of course would be per-tab), you could see where you had bookmarks.

Many of the real stud-god hackers still use vi and emacs, and after doing all this thinking about BFS, I can see why. Emacs and vi plus xterms have really good facilities for keeping track of various streams/stacks of code traversals. If you decide you are at a decision point, where you don’t know which of two paths you should take, just open up a new xterm/editor, load up the file, and move that screen off to the side. Move it somewhere where you can still see it and remember that it is there, but where it isn’t in your way.

Inside each editor session, the editor provides good tools for quickly going forwards or backwards in the code, and that history never gets muddied with any other path that you are traversing.


comparative programming linguistics

Posted in Hacking, programmer productivity, Technology trends at 8:26 pm by ducky

I have seen a lot of discussion over the years of the relative strengths (or weaknesses) of specific languages. People talk about why they use this language or that language, and they seem to always talk about specific linguistic constructs of their favored language.

“Mine has generics!” “Mine has macro expansion!” “Mine has closures!”

Sometimes the various devotees will give a nod to the richness of their language’s libraries, or to the robustness of their compiler, but rarely.

Recently, I’ve been working on a hobby project in PHP while reading up on tools like odb, JML, Daikon, Esc/Java2, javaspider, and EmmaECL. The contrast is stark.

PHP is stable enough, and seems to have plenty of libraries, but PHPEclipse quite downrev compared to Eclipse, the debugger doesn’t work at all for me (and I don’t now where to start troubleshooting), and there are essentially no additional tools. I feel like a starving dieter writing reviews for a gourmet food magazine: shackled to PHP and pining for the abundance of the Java tools.

Java’s advantages in the tool arena aren’t accidental.

  • Its static typing and no pointers makes a lot of tools easier to write.
  • Having no pointers makes it easier to teach, so undergraduate classes are now usually taught in Java, which means that the grad students tend to use Java when they research new tools.
  • The Eclipse IDE, being both open source and supported by IBM, makes it a great platform for tool development.

I am just about ready to swear fealty to Java, purely because of the richness of the third-party programming toolset.

software tools: EclEmma

Posted in Eclipse, programmer productivity at 8:09 pm by ducky

In a previous post, I said that I thought it would be handy to have your source editor color code based on which lines were executed in the prior execution of the code. I mused about merging Eclipse with a profiler in that post, but later realized that I could also use a code completion tool… and then discovered someone had already done it. EclEmma is a fine code coverage tool that is nicely integrated with Eclipse and does exactly what I want.

EclEmma isn’t positioned as a debugging tool, but it sure can be used as one.

"Chunking" — from Vessey

Posted in programmer productivity at 9:37 am by ducky

From Iris Vessey’s Expertise in Debugging Computer Programs: An Analysis of the Content of Verbal Protocols Systems:

“Experts have more and/or larger knowledge structures in long-term memory, which they build up by the process of chunking. … Chunking refers to the concept whereby humans can effectively increase the capacity of short-term memory by grouping related items into a “chunk,” storing the chunk in long-term memory and the pointer to the chunk in short-term memory. For example, most of us would store KBD as three chunks, while we would store CAT as one since we perceive it as a unit, i.e., as an animal. (See Simon [39].) The importance of knowledge structures to expertise was first established by de Groot [46] and Chase and Simon [22] in their studies of expert and novice chess players. This work has since been replicated in the programming domain by Shneiderman [47] and McKeithen et al. [48].”

This sounds to me like an excellent reason to read Design Patterns.


software tools: JML and Daikon

Posted in Hacking, programmer productivity at 12:40 pm by ducky

From the name, I had thought that the Java Modeling Language (JML) was going to be some specialized variant of UML. I haven’t worked with UML, but what I see other people doing with it is drawing pictures.

Instead, JML turns out to be a specification for putting assertions about the code’s behaviour in comments. In that way, it is much more similar to Javadoc than to UML.

With both Javadoc and JML, the author of a method makes promises about what the method will do and what the method requires. In Javadoc, for example, @return int is a promise that the method will return an int, and @param foo int says that the method needs a parameter foo of type int.

With Javadoc, however, the promises are pretty minimal and the requirements are all stated elsewhere (like in the method definition). With JML, the promises about post-conditions and requirements on the pre-conditions can be much more elaborate. The programmer can promise that after the method finishes, for example:

  • a particular instance variable will be less than and/or greater than some value
  • an output sequence will be sorted in ascending order, or
  • variable foo will be bigger than when the method started.

The programmer can also state very detailed input requirements, like that an instance variable can’t be null, that an input must be in a certain range, or that the sum of foo and bar must be less than baz.

This rigorous definition of pre- and post-conditions is useful for documentation. The next programmer doesn’t have to read through the entire method to figure out that foo can’t be null, for example.

Additionally, the JML spec is rigorous enough that it can be used for with a variety of interesting and useful tools. With a special compiler (jmlc), pre- and post-condition checks can get compiled in to the code. Thus if someone calls a method with a parameter outside the allowed bounds, the code can assert an error. (The assertions can also be turned off for production code if so desired.)

But wait, there’s more! The specs are rigorous enough that a lot of checking can be done at compile time. If method A() promises that its output will be greater than 3, and method B() says that it requires the output to be greater than 5, then B(A()) would give a warning: A can give output (between 3 and 5) that B would gag on. See ESC/Java2.

But wait, there’s more! The JML annotations can be used to create unit and tests. The jmlunit writes tests that wiggle the input parameters over the legal inputs to a method, and checks to see if the outputs are correct.

There’s the small problem that it’s a pain to write all the pre-and post-conditions. Fortunately, there’s a tool (Daikon) which will help you with that. Daikon watches as you run a program and sees what changes and what doesn’t change, and from that, generates promises/requirements. Note that those might not be correct. If there are bugs in your program or if that execution of your program didn’t hit all the corner cases, then those won’t be correct. However, it will give you a good start, and I find that it is easier to spot mistakes in somebody else’s stuff than it is to spot omissions in things that I did.

This is all way cool.


software tools: omniscient debugger

Posted in Hacking, programmer productivity at 9:41 am by ducky

Bil Lewis has made odb, an “omniscient debugger”. It saves every state change during the execution of a program, then lets you use the debugger to step forward through the execution or backwards. You don’t have to guess about why a variable has the value it does; you can quickly jump to the last place it was set.

This seems like an extremely useful way to do things! Bil Lewis wonders why more people aren’t using his better mousetrap: “(I don’t understand why. I’m clever, I’m friendly, I’m funny. Why don’t people go for my stuff? I dunnknow.) ”

I like it, but…

  • Its installation is a bit fragile. If you don’t know exactly what you are doing and do everything exactly right, there isn’t a whole lot of help to get you back on track. (If you are a Java stud and don’t make any mistakes, the installation is straightforward.)
  • It is not an IDE. It is a stand-alone tool. Yes, its web page says that it can be launched from inside Eclipse, but it sure looks like that just spawns it as a separate process: it doesn’t look like it plugs into the standard Eclipse debugger. That means that it has a separate learning curve and fewer features.
  • I worry that for programs larger than “toy” programs, the amount of data that it will have to collect will overwhelm my system’s resources. Will I be able to debug Eclipse, for example?

Now, if it were tightly integrated with Eclipse, like this internal C simulation tool that Cicso presented at EclipseCon, I would be All. Over. It. As it is, I’m probably going to use odb, but only occasionally.

Update 21 May 2007: There is an omniscient version of gdb! It’s called UndoDb.

Update 28 July 2008: There is a commercial Java omniscient debugger called Codeguide, and a research one called TOD (Trace-Oriented Debugger).  Also, Andrew Ko’s Designing the Whyline describes a user study of an omniscient debugger, although that fact that it is an omniscient debugger is kind of buried.


robobait: software tools: how to use javaspider

Posted in Hacking, programmer productivity, robobait at 1:50 pm by ducky

I’m starting to look at a bunch of software engineering tools, particularly those that purport to help debugging. One is javaspider, which is a graphical way of inspecting complex objects in Eclipse. It looks interesting, but which has almost no documentation on the Web. I’m not sure that it’s something I would use a lot, but in the spirit of making it easier to find things, here is a link to a brief description of how to use javaspider.

Keywords for the search engines: manual docs instructions javaspider inspect objects Eclipse


false hypotheses

Posted in programmer productivity at 10:41 pm by ducky

A theme in Andrew Ko’s papers (which I’ve read a lot of recently) is that a lot of the problems that programmers have is due to making invalid hypotheses. For example, in Designing the Whyline, (referring to an earlier paper, which I didn’t read closely), they say “50% of all errors were due to programmers’ false assumptions in the hypotheses they formed while debugging existing errors.”

It seemed that chasing down an incorrect hypothesis could also chew up a lot of time. I suspect that when coders get stuck, it’s because they have spent too long chasing down the wrong trail.

Yesterday, I had two friends take a quick, casual look at some code to see if they could quickly find where certain output was written out. Both worked forward in the code, starting at main() and tracing through calls to see where control went. At one point in the source, there were three different paths that they could have chosen to trace. The first person chose to follow statement A; the second person chose to follow statement B. Both were reasonable choices, but A happened to be the correct one.

The first person took ten minutes, while the second person spent 30 minutes running off into the weeds chasing statement B. (He eventually gave up, backtracked, and took about ten minutes following statement A to the proper location.)

Implications for programmer productivity measures

The second person took four times as long as the first to complete the task. Was the first person a four times “better” programmer? I don’t think so. From my looking at the code, the second person made a completely legitimate choice. I’m quite happy to believe that on some other task, the first person might make the wrong choice at first and the second person make the right choice.

This makes me even more suspicious of people claiming that there is a huge productivity difference among programmers. The controlled studies that I have seen have all had a very small number of tasks, far too few to make significant generalizations about someone’s long-term abilities.

Furthermore, I think there is sample bias. For a user study, you have to have very simple problems so that people have a chance to finish the allotted tasks in a few hours or less. That favors people who do “breadth-first” analyses of code; who spend a tiny bit of time on one hypothesis, and if that doesn’t quickly give results, move on to the next one.

However, sometimes problems really are gnarly and hairy, and you really do have to stick to one hypothesis for a long time. People who are good at sticking to the one hypothesis through to resolution (without getting discouraged) have value that wouldn’t necessarily be recognized in an academic user study of programmer speed.

How can we reduce false hypotheses?

After a binge of reading Andrew Ko papers last week, I decided to start forcing myself to write down three hypotheses every time I had to make a guess as to why something happened.

In my next substantive coding session, there were four bugs that I worked on. For two of them, I thought of two hypotheses quickly, but then was stumped for a moment as to what I could put for a third… so I put something highly unlikely. In once case, for example, I hypothesized a bug in code that I hadn’t touched in weeks.

Guess what? In both of those cases, it was the “far-fetched” hypothesis that turned out to be true! For example, there was a bug in the code that I hadn’t touched in weeks: I had not updated it to match some code that I’d recently refactored.

While it’s too early for me to say what the long-term effect of writing down three hypotheses will be, in the limited coding I’ve done since I started, it sure feels like I’m doing a much better job of debugging.

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »