There’s a cool paper on a tool to do semi-automatic debugging: Triage: diagnosing production run failures at the user’s site. While Triage was designed to diagnose bugs at a customer site (where the software developers don’t have access to either the configuration or the data), I think a similar tool would be very valuable even for debugging in-house.
They use a number of different techniques to debug C++ code.
- Checkpoint the code at a number of steps.
- Attempt to reproduce the bug. This tells whether it is deterministic or not.
- Analyzes the memory by walking the heap and stack to find possible corruptions.
- Roll back to previous checkpoints and rerun, looking for buffer overflows, dangling pointers, double frees, data races, semantic bugs, etc.
- Fuzz the inputs: intentionally vary the inputs, thread scheduling, memory layouts, signal delivery, and even control flows and memory states to narrow the conditions that trigger the failure for easy reproduction
- Compare the code paths from failing replays and non-failing replays to determine what code was involved in that failure.
- Generate a report. This gives information on the failure and a suggestion of which lines to look at to fix it.
They did a user study and found that programmers took 45% less time to debug when they used Triage than when they didn’t for “real” bugs, and 18% for “toy” bugs. (“…although Triage still helped, the effect was not as large since the toy bugs are very simple and straightforward to diagnose even without Triage.”)
It looks like the subjects were given the Triage bug reports before they started work, so the time that it takes to run Triage wasn’t factored into the time it took. The time it took Triage to run was significant (up to 64 min for one of the bugs), but presumably the Triage run would be done in background. I could set up Triage to run while I went to lunch, for example.
This looks cool.
I have finished my MS thesis, Path Exploration during Code Navigation!
Here is a summary of what I learned during my two years of research; the thesis covers most but not all of the following:
I started out at UBC asking what good programmers do that bad programmers don’t. That raises the question, what is “good”? Good clearly has a time-to-complete component, but also a quality component. I looked around and couldn’t find a worthwhile quality measure, alas, so settled for looking for speed measures. I found some, talked about them on my blog, and summarized them in the first part of my VanDev talk. The big take-away for programmers is “Don’t get stuck!” (Note: this part is not written up in my thesis.)
Don’t get stuck!
How do you not get stuck? The literature seemed to imply that less-effective problem solvers in a domain (not just CS) tend to stick to one hypothesis for way too long, only abandoning it when they hit a dead end. The literature sure seemed to say that a shallower search over more paths (hypotheses) was better than searching one path more deeply. In CS lingo, a more breadth-first search (BFS) apparently is more effective than a more depth-first search (DFS).
This is consistent with a large body of research in Psychology about confirmation bias. If you have one hypothesis in your head, you will tend to over-believe evidence that agrees with that hypothesis and under-believe evidence that does not agree with that hypothesis. (For example, if you believed that Saddam Hussein had weapons of mass destruction and was trying to get more, you’d tend to believe reports that Hussein was trying to buy yellowcake from Niger and discount reports by UN weapons inspectors that he did not have WMDs.) There was a really neat paper Dual Space Search During Scientific Reasoning which found that giving people a few minutes dedicated to coming up with as many hypotheses as they could meant that they solved a specific problem much, much faster than a control group that started problem-solving immediately.
If you think of exploration paths as hypotheses of the form, “If I explore down this path, I will find what I’m looking for”, then this says that you would want to keep a few exploration paths in play at a time. You wouldn’t want to try to explore them all simultaneously, but you’d want them in the back of your mind to keep you from the trap of confirmation bias.
Tab support for Breadth-first-search
I noticed that Firefox tabs were much better at helping me keep track of different exploration paths than Eclipse tabs did. Firefox lets me open a bunch of search results in new tabs — putting those “hypotheses” in the back of my mind — and then, once I’ve opened all the search results I’m interested in, explore each in turn. Eclipse opens every file in a new tab, which doesn’t help you keep track of exploration paths. (Imagine if Firefox opened every Web page in a new tab. That would suck.)
Armed with papers that suggest that breadth-first-search (BFS) was better than depth-first-search (DFS), I made a modified version of Eclipse, called Weta (for WEblike TAbbing), so that it had Firefox-style tabbing, then ran a user study. I specifically wanted to see if a more BFS-ish approach would help, I set up a user study like this:
- The subjects did two programming tasks with stock Eclipse.
- I told them about the research that said that BFS was better.
- I showed them how to do a more BFS-ish navigation with Weta
- The subjects did two tasks with Weta.
Most of the subjects pretty much loved the idea of Weta, but none of them ever used Weta to keep track of multiple exploration paths, different branches in the exploration tree. They used Weta, but they used it to mark places on the main trunk for them to come back to later. They’d open the declaration of an element in a new tab, immediately switch to the new tab, and continue down that path.
Why didn’t they use BFS?
Why didn’t they use BFS in the way that I had trained them? Several possibilities:
- Time? Maybe they didn’t have enough time to get used to using Weta. After using Eclipse for years, maybe two twenty-minute tasks just didn’t give them enough time to adjust to a new way of doing things.
- Complexity? Maybe the cognitive load of navigating code is so high that the cost of switching paths is high enough that switching frequently isn’t worth it. Maybe Web navigation is easy enough that the switching cost is low enough that BFS is worth it.
- In twenty lines of a Web page, you probably will only have two or three links to other Web pages, but in twenty lines of Java source you will probably have twenty or thirty Java elements that all have relationships to other Java elements.
- Web pages only have one kind of link; Java elements have multiple kinds of relationships to each other (calls, is called by, inherits, and implements).
- Code is harder to read than Web pages (assuming that you read the language the page is in). You don’t have to worry about conditionals and exceptions in Web pages.
- Confirmation bias, not BFSness? Maybe what is really important is that people not have confirmation bias, not that they use a particular strategy. Maybe just writing down three ideas for what is the root of the problem would be simpler and would require less effort.
- Bookmarks? Maybe developers wanted/needed bookmarks even more than they wanted BFS tools. In the Web navigation literature, the code navigation literature, and casual conversations with my friends and colleagues, I kept hearing that the list of bookmarks gets so long that it becomes unwieldy.
- Mylyn is an Eclipse plug-in that is all about hiding information that you don’t need, so I asked the Mylyn team to hide all the bookmarks except those that were set during the currently-active Mylyn Task. They did, and they actually finished before my user study. Unfortunately, I didn’t realize the possible importance of bookmarks on my study, so didn’t bring it up. That feature of Mylyn was new enough that even the four regular Mylyn users in my study didn’t know about it.)
It would be interesting to see what people did if they had Weta for a longer term, and also to see what they do with bookmarks if given the Mylyn bookmark enhancement.
Other factors affecting getting stuck
While it wasn’t what the study was designed for, I noticed several things that multiple subjects had trouble with. One type of difficulty was Eclipse-specific, and one was not.
- Search. The Java Search dialog and the Find dialog both tripped people up, especially the Java Search dialog. I put in an enhancement request for a better UI for the Java Search dialog that has been assigned (i.e., the Eclipse team will probably do it). I also put in a request for a better Find dialog, but they indicated that they won’t fix it. 🙁
- Lack of runtime information. Static tracing frequently took the users them to different places than the program actually went. I give a handful of examples in the writeup, but here’s one:
- In one of the tasks, users needed to do something with code that involved a GUI element that had the text “Active View Size” in it. They all did a search for “Active View Size” and went to a method in the class DrawApplication. They then did static tracing inside DrawApplication, following relationships from Java elements to other elements. It pretty much kept them inside the class DrawApplication. However, DrawApplication was a superclass of a superclass of the class that was actually run when reproducing the bug! Three of the seven subjects never noticed that.
I think the discrepancy between runtime behaviour and static tracing is important. Subjects spent a lot of time trying to figure out what was executed, or went on wild goose chases because they got the wrong idea about how the code executed.
In addition to the little suggestions I had about the Java Search Dialog and the Find Dialog, I suggest that the IDE be enhanced to give developers visual information about what code is involved in the reproduction of a bug. I’ve talked about this wonder tool before, but I think I can be more concise here:
- Let the developer set things similar to breakpoints that mark which code that is broken during a run. Set a “start logging” breakpoint before things get messed up (or at the start of the code by default); set a “stop logging” breakpoint after things are known to be messed up.
- Colour the source code background based on whether the messed-up code was fully, partially, or not executed. (EclEmma already does this, but does it for the entire run, not for selected sections of the program execution.)
- Use Mylyn to (optionally) hide the Java elements (classes, methods, interfaces) that were not executed at all during the messed-up section of code. Now instead of having to search through maybe three million lines of code to understand the bug, they’d only have to search through maybe three hundred lines.
If you want more details, please see the full thesis, comment here, or shoot me an email message (ducky at webfoot dot com).
I did a very quick, informal survey on how people use tabs when looking at Web search results. Some people immediately open all the search results that look interesting in new tabs, then explore them one by one (“open-parallel). Others open one result in a new tab, explore it, go back to the search page, then open the second result in another tab, etc. (“open-sequentially”). Note that the “open-sequential” people can have lots of tabs open at a time, they just open them one by one.
To clarify, open-parallel means control-clicking on the URL for result #1, then on result #2, then on #3, then on #4, and only THEN and switching to the tab for #1, examining it, switching to the tab for #2, etc. Open-sequential means control-clicking on the URL for result #1, switching to the tab for #1, examining #1, switching to the search results page, control-clicking on #2, switching to the tab for #2, examining #2, switching to the search results page, etc.
I was surprised to find that the people who had been in the US in the early 2000’s were far more likely to use the open-parallel strategy. There was an even stronger correlation with geekdom: all of the geeks used the open-parallel, and only two of the non-geeks did.
||Where in early 00’s?
||Where in early 00’s?
||Working/studying in US
||Working/studying in Canada
||Working in US
||Studying in US
||Studying in US
||Studying in Canada
||Studying/working in US
||Studying in US
||Working in Australia(?)
||Working in Europe(?)
||Working in US
||University in Canada
||Working in Canada
||University in Canada
||University in US
||University in US
Notes on the survey:
- The subject pool is not representative of the general propulation: everyone who answered lives or lived at my former dorm at UBC, has a bachelor’s degree, and all but one have an advanced degree or are working on one.
- I classified people as geeks if they had had Linux on at least one of their computers and/or had worked in the IT industry. The person with the “sort-of” in the geek column doesn’t qualify on any of those counts, but was a minor Internet celebrity in the mid 90s.
What does this mean?
What does this mean? I’m not sure, but I have a few ideas:
- I suspect that the geeks are more likely to have used a browser with modern tabbing behaviour much earlier, so have had more years to adapt their strategies. (Internet Explorer got modern tabbing behaviour in 2006; Mozilla/Firefox got it in 2001.)
- One of the benefits of the open-parallel strategy is that pages can load in the background. Maybe in 2001, Web access was slower enough that this was important and relevant. Maybe it’s not that the geeks have been using tabs longer, but that they started using tabs when the Internet was slow.
- It might be that the geeks do more Web surfing than the non-geeks, so have spent more time refining their Internet behaviour.