06.07.07
Posted in Random thoughts at 6:08 pm by ducky
[Note: a later study says that vitamin D, while good for colon cancer, isn’t the miracle that the study I report on here says it is. Drat.]
Vitamin D — sunlight — turns out to be really really good for your health. “The one-fifth of premenopausal women who consumed the highest levels of vitamin D and calcium […] had a one-third reduced risk of developing breast cancer compared with those who consumed the least.” One third! That’s significant! Update: 60 to 70 percent lower!
There have been a number of other studies recently that have connected low vitamin D in heart disease, multiple sclerosis, and diabetes.
The easiest way to get vitamin D is from sunlight, but that doesn’t help people in northern climes like Canada and Scotland in the winter. There is vitamin D in milk, but if you don’t get any from sun (like in northern winters) you’d have to drink three litres per day to get enough. (I drink an unusually large quantity of milk, but even I only drink about a litre per day.)
Yes, it is true that more sun exposure increases the risk of skin cancer, but it turns out that skin cancer is much easier to notice, diagnose, and treat (being on the surface and all). It’s just not as big a problem “Fifteen hundred Americans die every year from [skin cancers]. Fifteen hundred Americans die every day from the serious cancers.”
Unfortunately, it’s impossible to give a blanket recommendation of how much vitamin D supplement you should take. It depends on the latitude, the time of year, how much you are outside, and how dark your skin is. The lighter your skin, the more vitamin D you can absorb from sunlight. Also, vitamin D is fat-soluable, so it is possible to get too much.
So lay off that sunblock!
Permalink
Posted in Random thoughts at 5:44 pm by ducky
The hormone vasopressin appears to regulate nurturing behaviour in males — at least in some voles. Prairie voles are monogamous; mountain voles are wildly polygamous. If you give prairie voles vasopressin, that triggers nurturing. In mountain voles, it doesn’t — but it turns out that mountain voles are missing a receptor for vasopressin that the prairie voles have. If you give the mountain voles the receptor, they become nurturing. See the story.
Meanwhile, it has long been known that oxytocin makes females more nurturing.
(There is nothing particularly new about this, I just decided to start making links to things that I think are really interesting that other people don’t seem to know about.)
Permalink
06.05.07
Posted in programmer productivity, review at 4:07 pm by ducky
I really wanted to like Expertise in Debugging Computer Programs: a Process Analysis by Iris Vessey (Systems, Man and Cybernetics, IEEE Transactions on, Vol. 16, No. 5. (1986), pp. 621-637.) Van Mayrhauser and Vans said in Program Comprehension During Software Maintenance and Evolution indicated that Vessey had shown that experts use a breadth-first approach more than novices.
I was a little puzzled, though, because Schenk et al, in Differences between novice and expert systems analysts: what do we know and what do we do? cited the exact same paper, but said that novices use a breadth-first approach more than experts! Clearly, they both couldn’t be right, but I was putting my money on Schenk just being wrong.
It also looked like the Vessey paper was going to be a good paper just in general, that it would tell me lots of good things about programmer productivity.
Well, it turns out that the Vessey paper is flawed. She studied sixteen programmers, and asked them debug a relatively simple program in the common language of the time — COBOL. (The paper was written in 1985.) It is not at all clear how they interacted with the code; I have a hunch that they were handed printouts of the code. It also isn’t completely clear what the task termination looked like, but the paper says at one point “when told they were not correct”, so it sounds like they said to Vessey “Okay, the problem is here” and Vessey either stopped the study or told them “no, keep trying”.
Vessey classified the programmers as expert and novice based on the number of times they did certain things while working on the task. (Side note: in the literature that I’ve been reading, “novice” and “expert” have nothing to do with how much experience the subjects have. They are euphemisms for “unskilled” and “skilled” or perhaps “bad” and “good”.)
She didn’t normalize by how long they spent to finish the task, she just looked at the count of how many times they did X. There were three different measures for X: how often did the subject switch what high-level debugging task they were doing (e.g. “formulate hypothesis” or “amend error”); start over; change where in the program they were looking.
She then noted that the experts finished much faster than the novices. Um. She didn’t seem to notice that the that the count of the number of times that you do X during the course of a task is going to correlate strongly with how much time you spend on the task. So basically, I think she found that people who finish faster, finish faster.
She also noted that the expert/novice classification was a perfect predictor of whether the subjects made any errors or not. Um, the time they took to finish was strongly correlated with whether they made any errors or not. If the made an error, they had to try again.
Vessey said that 15/16 experts could be classified by a combination of two factors: whether they used a breadth-first search (BFS) for a solution or a depth-first search (DFS) whether they used systems thinking or not However, you don’t need both of the tests; just the systems-thinking test accurately predicts 15/16. All eight of the experts always used BFS and systems-thinking, but half of the novices also used BFS, while only one of the novices used systems-thinking.
Unfortunately, Vessey didn’t do a particularly good job of explaining what she meant by “system thinking” or how she measured it.
Vessey also cited literature that indicated that the amount of knowledge in programmer’s long-term memory affected how well they could debug. In particular, she said that the chunking ability was important. (Chunking is a way to increase human memory capacity by re-encoding the data to match structures that are already in long-term memory, so that you merely have to store a “pointer” to the representation of the aggregate item in memory, instead of needing to remember a bunch of individual things. For example, if I ask you to remember the letters C, A, T, A, S, T, R, O, P, H, and E, you will probably just make a “pointer” to the word “catastrophe” in your mind. If, on the other hand, I ask you to remember the letters S, I, T, O, W, A, J, C, L, B, and M, that will probably be much more difficult fo you.)
Vessey says that higher chunking ability will manifest itself in smoother debugging, which she then says will be shown by the “count of X” measures as described above, but doesn’t justify that assertion. She frequently conflates “chunking ability” with the count of X, as if she had fully proven it. I don’t think she did, so her conclusions about chunking ability are off-base.
One thing that the paper notes is that in general, the novices tended to be more rigid and inflexible about their hypotheses. If they came up with a hypothesis, they stuck with it for too long. (There were also two novices who didn’t generate hypotheses, and basically just kept trying things somewhat at random.) This is consistent with what I’ve seen in other papers.
Permalink
Posted in Canadian life at 10:47 am by ducky
I mentioned before that low-level government functionaries seem to have more discretion in Canada then in the US. Since then, I’ve seen a few more examples of this.
- We heard a story about a company hiring a huge (HUGE) truck to pick something up in Canada and bring it to the US. Aspen trucks are built in Canada, and used for hauling oil derricks around. They are so big that they have two steering wheels so that two sections can be steered independently. The company hired two US drivers, rented a truck (which happened to be in the US at the time), got all the paperwork cleared by the BC government in Victoria, and drove up to the border. The agent at the border said, “You can’t bring that thing in here! It’s too big!” The drivers said, “But we have the paperwork!” The agent said, “You can’t drive that thing in here!” The drivers said, “But we have the paperwork!” The agent said, “I’m cancelling the permit. You’ll have to talk to Victoria. Now park that thing over there.” The drivers parked it over there slick as a whistle (because it had two steering controls), and the agent said, “Oh. I didn’t realize you could do that. If I had realized, I wouldn’t have cancelled the permit.” (He let the truck through, although delayed one day because it was too late in the day to un-cancel the permit.) Now, both sides had some culpability in the communications breakdown, but the US drivers thought that because they had followed the rules, they were in the clear. You might say that the agent was at fault, but if the truck in fact did not have two steering columns, you would have wanted him to cancel the permit.
- Jim has had to deal with Transport Canada a few times about issues pertaining to his medical clearance to fly. He’s been in in-person once, but usually talks to them on the phone. They remember his name, ask him how his flying is going, and once asked him about a newspaper article that he appeared in.
- I did a favor for someone who was off-campus and needed to submit a form. It was past the deadline, but that didn’t seem to matter.
I told some US friends about how functionaries had much more discretion in Canada, and they were appalled. “You can’t do that! You’ll get unfair treatment!”
I thought about Southwest Airlines, which we had flown on recently. One of Southwest’s advertised strengths is that its employees have quite a lot of discretion. Why was it okay for Southwest employees to have discretion but not US governmental employees?
I realized that my US friends have the very strong belief that government officials are incompetent and/or hostile. Government functionaries only make your life worse, not better. However, in Canada, the petty bureaucrats who I have interacted with have been competent, polite, and worked to make my life better. In general, they have made my life better.
While I realize that there are good agents in the US and bad agents in Canada, I think that in general the bureaucrats are better in Canada.
Permalink
05.28.07
Posted in programmer productivity at 4:38 pm by ducky
As I mentioned before, I saw two coders working on the same task using the same navigational style take very very different lengths of time. While tracing forward through code, one happened to pick the correct choice from two equally plausible choices; one happened to pick the wrong one and get off into the weeds.
This makes me even more convinced that the important thing when coding is to not get stuck. It also said to me that a more breadth-first search (BFS) strategy (where you explore each path for a little ways before examining any path in depth) was a better idea than depth-first search (DFS). This got me thinking about why my pal didn’t do BFS, why he spent 30 minutes off in the weeds.
Eclipse is really bad at supporting BFS. Some of the things it does are generically bad, but some are badnesses all of its own.
- The tabs jump around a lot. Eclipse currently has a strange way of dealing with its tabs. There is a way to get tabbing behavior where it doesn’t jump around a lot — more like Firefox, but it is not the default. (Window->Preferences->General->Editors-> put a check in “Close editors automatically:”, set “Number of opened editors before closing” to 1, and select the radio button “Open new editor”. Now there will be a little “pin” icon to the right of the forward-history button that you can use to say “Don’t close this pane”.)
- Eclipse doesn’t have a per-tab/page history like Firefox does. All your history gets jumbled together. This means that you can’t use tabs/pages to help you keep track of individual tasks.
- It’s difficult to mark points for future reference. There are bookmarks in Eclipse, but most people don’t know about them. Even if you are one of the few and proud who knows about bookmarks, they are hard to use. It would be nice if in the history (which of course would be per-tab), you could see where you had bookmarks.
Many of the real stud-god hackers still use vi and emacs, and after doing all this thinking about BFS, I can see why. Emacs and vi plus xterms have really good facilities for keeping track of various streams/stacks of code traversals. If you decide you are at a decision point, where you don’t know which of two paths you should take, just open up a new xterm/editor, load up the file, and move that screen off to the side. Move it somewhere where you can still see it and remember that it is there, but where it isn’t in your way.
Inside each editor session, the editor provides good tools for quickly going forwards or backwards in the code, and that history never gets muddied with any other path that you are traversing.
Permalink
05.26.07
Posted in Hacking, Maps at 5:49 pm by ducky
I just open-sourced the code for Mapeteria. If any of you are PHP4 gods, I have a few questions…
Permalink
Posted in Technology trends at 10:34 am by ducky
Robert X. Cringely has an article where he says that Google will ultimately be killed off by employees who have left to start their own company.
While I don’t own any crystal balls, I don’t think Google is as doomed as he does. His reasoning is that the 20% time Googlers have to work on other projects will result in 4,000 new product ideas per year, of which 400 will be good, but only about ten of which can be turned into new products. He says that the people who had the other 390 will be bitter and go start their own companies.
I have to argue with some of the math.
- Not all 20% projects are for new products. I bet most aren’t, in fact. If I were to start working at Google tomorrow, I would probably try to work on:
- user-generated thematic maps
- spam reduction
- better contact management in Gmail
- programmer productivity tools
- a to-do list manager
80% of the the projects on my list are either internal tools or add-ons to some existing product. It is way, way easier to think of a feature enhancement than a completely new product. I would be really surprised (and disappointed) if they aren’t already working on a to-do list manager, so my list probably has 0% new products.
- Not everybody will be working alone. If, for example, I started a new to-do list manager, there is no way that I would be able to productize it all myself in 20% time. I would want to recruit others to help me on it. This means that there will be fewer ideas than people.
- One of Google’s great strengths is its hardware infrastructure. Their 2006 financial statement showed $2.4 billion (yes, Billion with a B) worth of property and equipment assets. That gives the potential defectors a reason to stay: they have a whole lot more toys to play with if they stay (and a real disadvantage if they try to compete directly with Google).
- I never worked on a 20% project, so I don’t know if they ever get canceled. I suspect that it’s very rare that you’d get told that you had to stop working on it. Thus if you really believed in something, you’d keep working on it as a 20% project because you were sure that if you just added a frobnitz, then the powers that be would see how incredibly cool it was, and would push it. Eventually, something else that was shiny would come along and you’d put aside your wonderful thing just for a bit… and your project would just wither away.
- Working at Google is awfully pleasant. In addition to the food and stuff, you get to hang out with really nice, really smart people, and other people take care of nuisances like facilities, payroll, tech support, etc. You get to work on fun stuff that you want to do. Why would you ever leave?
While Cringely figures there will be 390 worthwhile projects per year that will get killed, I figure that the number of worthwhile new-product ideas will be less than 20: (3700 coders in FY 2006/ 3 people per team) * (1 new product / 10 projects) * (1 product that is worthwhile / 10 proposed products) = 12 worthwhile products.
In 2006, as near as I can tell, they launched nine new products: GCalendar, GDocuments, GSpreadsheets, GCheckout, SketchUp, GCo-op, and about three versions of existing products for mobile phones.
Only four of the things I mentioned were really new (vs. a port to phones) and came from in-house (vs. acquisitions). GCo-op doesn’t seem all that major to me, so really there were three major new in-house products: GCalendar, GSpreadsheets, and GCheckout. If my estimations are right, then that means that there were nine new products that got orphaned. Probably less than 10% of the people who have orphaned products will leave, so that means less than one project would leave. If that product required Google’s infrastructure, then chances would be even lower that it would escape.
The fact that two of the products that were released in 2006 (SketchUp and GDocuments) came from acquisitions says to me that Google doesn’t have enough new product ideas internally to keep up with the number they can release and support. I don’t know, but I suspect that I was being optimistic in my estimate of new products per 20% project. It’s probably much lower than 10%. This would mean that Google actually has quite a ways to go before they start losing people who are frustrated that their pet project got cancelled.
I expect that there are a non-zero number of people who will quit and start their own companies, but I think that will be because they see an opportunity in an area outside of Google’s business. They will decide to open a restaurant, or consult, or design video games, or set up a charter bus tour company. Some people will step off of the treadmill and raise kids, go into the ministry, or become forest rangers or professional snowboarders. While Google might miss those people, I don’t think that the professional snowboarders will be a threat to Google’s continued existence.
Permalink
05.25.07
Posted in Random thoughts at 6:48 pm by ducky
I have been a bit surprised at something that I can do that apparently most people can’t. I can listen to somebody speaking and repeat everything they say in realtime, with only the briefest of delays between when it comes out of their mouth and when it comes out of mine. From what I hear from others, most people can’t do this.
This is not a skill I’ve practiced. To the best of my knowledge, I have always been able to do this. It never occurred to me that anybody would not be able to do this. (Maybe I can simply because I assumed that I could?)
But if I’m going to have some secret power, couldn’t it be something useful? Like curing cancer? Diagnosing cancer? Or at least being able to cure canker sores?
Or, alternatively, is there any value at all in being able to simultaneously translate from English to English?
Permalink
05.21.07
Posted in Hacking at 10:09 pm by ducky
When I started Mapeteria, it seemed like it would be pretty simple. It turned out to be much more complicated that I had originally envisioned. I wasn’t terribly surprised (I have, after all, spent a long time in industry), but it was a bit annoying.
- Installation hell. It didn’t have to be that bad, but I didn’t know about the xampp project.
- Geographic data. I needed boundary information for countries and states/provinces.
- It was easy to find US data, but it was detailed/complex enough that it was really slow. Fortuitously, I saw Kevin Khaw’s state information, and he let me use it.
- Because Mapeteria makes KML that people could use anywhere, I wanted to be quite certain that I had the right to use the boundary data. I found boundary data for Canada, but it wasn’t absolutely clear to me that I had the right to redistribute it. (Unlike in the US, the Canadian government retains the copyright to governmentally-produced information like maps.) I decided it would be faster to just trace out points on Google Maps and use that for the boundaries. (That also gave me control over the complexity of the polygons.)
- I found country data relatively quickly, but it was complex enough that it was extremely slow to render on Google Maps. I was able to simplify the polygons pretty easily (by modifying a script by John Coryat that he’d adapted from Stephen Lime). Unfortunately, there a zillion little islands in that data, which make it much more complex than it needs to be. I believe that I will have to go remove all the islands by hand, yuck. 😛
- I stumbled upon boundary information for France, thanks to Alexandre Dubois (aka Zakapatul). Because I’d already done simplification for the countries, it was not hard to simplify France, but I still had to do it. I also had to strip a bunch of stuff out of the KML file that I didn’t need (like shields representing each departement).
- Bugs in Other People’s Code.
- I never did get the debugger to work right in PHPEclipse, and I didn’t even have a good idea for how to troubleshoot it. So I just had no debugger. 😛 Echo statements (like printfs) were my friends.
- There is a bug in Google Maps such that polygons that straddle the 180 E/W line are just broken. This makes sense — they are inherently ambiguous. However, Siberia, Fiji, and a Russian island straddle the line, alas.
- There is a bug in Google’s maps that I found when I was tracing the Canadian outlines. It wasn’t a big deal, but I spent non-zero time on it.
- Politics. What if somebody submits a data file with information for the USSR? Or Yugoslavia in 1970? Or Ethiopia in 1950? Or East Germany? I only have boundary information for 2006, not for all possible boundaries for all time.
- Documentation.
- What do I call states/provinces/territories/départments? I spent a fair amount of time trying to figure that out. I could call them “first level administrative divisions”, but the only people who know what that mean are map geeks. I could call them states or provinces or territories, but then I’d tick someone off. I never did figure out what to call them, so I call them states/provinces/territories/departments. 🙁
- What do I call the two-letter, uh two-number, uh two-character codes for states/provinces/territories/départements?
- How much detail do I give? How much is too much?
- Testing. In addition to unit tests, I was (for a period) trying to automate more global tests, but comparing a generated KML to a “golden” KML. However, I kept changing what was “golden” — I would take out or simplify polygons, add some debugging information, change from space-separated points to newline-separated points (and back), such that it was a real pain to keep the tests consistent. Eventually I gave up and just had some “eyeball” tests: does it look right?
- Evangelism.
- Who do I tell? How soon? Do I tell them about countries, even though there is still the bug in Google Maps? Even though countries display very slowly? Lots of time spend wondering about that.
- Open Source. I decided to open-source the code after I was basically done.
- I needed to go through and make my code conform to PHP standards (like using_underscores instead of CamelCase), take out some of my hacks, clean up TODOs.
- I needed to figure out where I was going to host the code. My own server? Sourceforge? Google? None were perfect, alas, so in addition to investigating, I had to do some agonizing, too, before settling on Google hosting.
- I needed to transfer all my bugs from my private Bugzilla to the Google issue tracker.
- I still need to transfer the code, which means installing a Subversion client and figuring out how to use it. It probably won’t take long, it’s something I should do anyways (like eating my vitamins), but it’s One More Thing.
So anyway, it always takes longer than you think it should; I decide to document why this time. 🙂
Permalink
Posted in Hacking, programmer productivity, Technology trends at 8:26 pm by ducky
I have seen a lot of discussion over the years of the relative strengths (or weaknesses) of specific languages. People talk about why they use this language or that language, and they seem to always talk about specific linguistic constructs of their favored language.
“Mine has generics!” “Mine has macro expansion!” “Mine has closures!”
Sometimes the various devotees will give a nod to the richness of their language’s libraries, or to the robustness of their compiler, but rarely.
Recently, I’ve been working on a hobby project in PHP while reading up on tools like odb, JML, Daikon, Esc/Java2, javaspider, and EmmaECL. The contrast is stark.
PHP is stable enough, and seems to have plenty of libraries, but PHPEclipse quite downrev compared to Eclipse, the debugger doesn’t work at all for me (and I don’t now where to start troubleshooting), and there are essentially no additional tools. I feel like a starving dieter writing reviews for a gourmet food magazine: shackled to PHP and pining for the abundance of the Java tools.
Java’s advantages in the tool arena aren’t accidental.
- Its static typing and no pointers makes a lot of tools easier to write.
- Having no pointers makes it easier to teach, so undergraduate classes are now usually taught in Java, which means that the grad students tend to use Java when they research new tools.
- The Eclipse IDE, being both open source and supported by IBM, makes it a great platform for tool development.
I am just about ready to swear fealty to Java, purely because of the richness of the third-party programming toolset.
Permalink
« Previous Page — « Previous entries « Previous Page · Next Page » Next entries » — Next Page »