I have finished my MS thesis, Path Exploration during Code Navigation!
Here is a summary of what I learned during my two years of research; the thesis covers most but not all of the following:
I started out at UBC asking what good programmers do that bad programmers don’t. That raises the question, what is “good”? Good clearly has a time-to-complete component, but also a quality component. I looked around and couldn’t find a worthwhile quality measure, alas, so settled for looking for speed measures. I found some, talked about them on my blog, and summarized them in the first part of my VanDev talk. The big take-away for programmers is “Don’t get stuck!” (Note: this part is not written up in my thesis.)
Don’t get stuck!
How do you not get stuck? The literature seemed to imply that less-effective problem solvers in a domain (not just CS) tend to stick to one hypothesis for way too long, only abandoning it when they hit a dead end. The literature sure seemed to say that a shallower search over more paths (hypotheses) was better than searching one path more deeply. In CS lingo, a more breadth-first search (BFS) apparently is more effective than a more depth-first search (DFS).
This is consistent with a large body of research in Psychology about confirmation bias. If you have one hypothesis in your head, you will tend to over-believe evidence that agrees with that hypothesis and under-believe evidence that does not agree with that hypothesis. (For example, if you believed that Saddam Hussein had weapons of mass destruction and was trying to get more, you’d tend to believe reports that Hussein was trying to buy yellowcake from Niger and discount reports by UN weapons inspectors that he did not have WMDs.) There was a really neat paper Dual Space Search During Scientific Reasoning which found that giving people a few minutes dedicated to coming up with as many hypotheses as they could meant that they solved a specific problem much, much faster than a control group that started problem-solving immediately.
If you think of exploration paths as hypotheses of the form, “If I explore down this path, I will find what I’m looking for”, then this says that you would want to keep a few exploration paths in play at a time. You wouldn’t want to try to explore them all simultaneously, but you’d want them in the back of your mind to keep you from the trap of confirmation bias.
Tab support for Breadth-first-search
I noticed that Firefox tabs were much better at helping me keep track of different exploration paths than Eclipse tabs did. Firefox lets me open a bunch of search results in new tabs — putting those “hypotheses” in the back of my mind — and then, once I’ve opened all the search results I’m interested in, explore each in turn. Eclipse opens every file in a new tab, which doesn’t help you keep track of exploration paths. (Imagine if Firefox opened every Web page in a new tab. That would suck.)
Armed with papers that suggest that breadth-first-search (BFS) was better than depth-first-search (DFS), I made a modified version of Eclipse, called Weta (for WEblike TAbbing), so that it had Firefox-style tabbing, then ran a user study. I specifically wanted to see if a more BFS-ish approach would help, I set up a user study like this:
- The subjects did two programming tasks with stock Eclipse.
- I told them about the research that said that BFS was better.
- I showed them how to do a more BFS-ish navigation with Weta
- The subjects did two tasks with Weta.
Most of the subjects pretty much loved the idea of Weta, but none of them ever used Weta to keep track of multiple exploration paths, different branches in the exploration tree. They used Weta, but they used it to mark places on the main trunk for them to come back to later. They’d open the declaration of an element in a new tab, immediately switch to the new tab, and continue down that path.
Why didn’t they use BFS?
Why didn’t they use BFS in the way that I had trained them? Several possibilities:
- Time? Maybe they didn’t have enough time to get used to using Weta. After using Eclipse for years, maybe two twenty-minute tasks just didn’t give them enough time to adjust to a new way of doing things.
- Complexity? Maybe the cognitive load of navigating code is so high that the cost of switching paths is high enough that switching frequently isn’t worth it. Maybe Web navigation is easy enough that the switching cost is low enough that BFS is worth it.
- In twenty lines of a Web page, you probably will only have two or three links to other Web pages, but in twenty lines of Java source you will probably have twenty or thirty Java elements that all have relationships to other Java elements.
- Web pages only have one kind of link; Java elements have multiple kinds of relationships to each other (calls, is called by, inherits, and implements).
- Code is harder to read than Web pages (assuming that you read the language the page is in). You don’t have to worry about conditionals and exceptions in Web pages.
- Confirmation bias, not BFSness? Maybe what is really important is that people not have confirmation bias, not that they use a particular strategy. Maybe just writing down three ideas for what is the root of the problem would be simpler and would require less effort.
- Bookmarks? Maybe developers wanted/needed bookmarks even more than they wanted BFS tools. In the Web navigation literature, the code navigation literature, and casual conversations with my friends and colleagues, I kept hearing that the list of bookmarks gets so long that it becomes unwieldy.
- Mylyn is an Eclipse plug-in that is all about hiding information that you don’t need, so I asked the Mylyn team to hide all the bookmarks except those that were set during the currently-active Mylyn Task. They did, and they actually finished before my user study. Unfortunately, I didn’t realize the possible importance of bookmarks on my study, so didn’t bring it up. That feature of Mylyn was new enough that even the four regular Mylyn users in my study didn’t know about it.)
It would be interesting to see what people did if they had Weta for a longer term, and also to see what they do with bookmarks if given the Mylyn bookmark enhancement.
Other factors affecting getting stuck
While it wasn’t what the study was designed for, I noticed several things that multiple subjects had trouble with. One type of difficulty was Eclipse-specific, and one was not.
- Search. The Java Search dialog and the Find dialog both tripped people up, especially the Java Search dialog. I put in an enhancement request for a better UI for the Java Search dialog that has been assigned (i.e., the Eclipse team will probably do it). I also put in a request for a better Find dialog, but they indicated that they won’t fix it.
- Lack of runtime information. Static tracing frequently took the users them to different places than the program actually went. I give a handful of examples in the writeup, but here’s one:
- In one of the tasks, users needed to do something with code that involved a GUI element that had the text “Active View Size” in it. They all did a search for “Active View Size” and went to a method in the class DrawApplication. They then did static tracing inside DrawApplication, following relationships from Java elements to other elements. It pretty much kept them inside the class DrawApplication. However, DrawApplication was a superclass of a superclass of the class that was actually run when reproducing the bug! Three of the seven subjects never noticed that.
I think the discrepancy between runtime behaviour and static tracing is important. Subjects spent a lot of time trying to figure out what was executed, or went on wild goose chases because they got the wrong idea about how the code executed.
In addition to the little suggestions I had about the Java Search Dialog and the Find Dialog, I suggest that the IDE be enhanced to give developers visual information about what code is involved in the reproduction of a bug. I’ve talked about this wonder tool before, but I think I can be more concise here:
- Let the developer set things similar to breakpoints that mark which code that is broken during a run. Set a “start logging” breakpoint before things get messed up (or at the start of the code by default); set a “stop logging” breakpoint after things are known to be messed up.
- Colour the source code background based on whether the messed-up code was fully, partially, or not executed. (EclEmma already does this, but does it for the entire run, not for selected sections of the program execution.)
- Use Mylyn to (optionally) hide the Java elements (classes, methods, interfaces) that were not executed at all during the messed-up section of code. Now instead of having to search through maybe three million lines of code to understand the bug, they’d only have to search through maybe three hundred lines.
If you want more details, please see the full thesis, comment here, or shoot me an email message (ducky at webfoot dot com).