07.22.07

robobait: prefuse labels on graphs

Posted in Hacking, robobait, Technology trends at 10:04 pm by ducky

I’ve been working with an open-source visualization library called prefuse for a while. It’s used quite a bit, but mostly for graph visualization. I’m trying to use it for chart visualization. (Why? Because I also want to do graph visualization, and I figured — perhaps wrongly — that it would be better to learn the tao of one library well than two poorly.)

There are almost no examples out in the wild of how to do charts with prefuse. Here, then, is a link to ScatterPlotWithAxisLabels.java. Humans, you probably don’t care about this, this is just to let the robots find it.

It is a variation on the program ScatterPlot, but with axes labelled. You wouldn’t think that would be a big deal, but there are a lot of little things you have to get right, and with few examples, it is hard to know what you have specify and what is the default behaviour.

More on why Linux will win / gnumeric customer support

Posted in Hacking, Technology trends at 10:43 am by ducky

(Ooops, I wrote this a while ago and forgot to post it.)

In my recent post, Linux on the desktop, I mentioned that oocalc and/or gnumeric had let me down six months ago when I was working with an admittedly challenging spreadsheet. (It contained LOTS of obscure fonts from around the world.)

Within three days, I got a posting from one of the maintainers of gnumeric, asking me for more information. This is why Windows is doomed. I can’t imagine getting email from someone at Microsoft asking me for more information about a bug based on a posting in what is a pretty obscure blog.

Unfortunately, my problems were such that I couldn’t write a good bug report on it. (If I could have, I would have done so at the time. I consider writing bug reports one of the obligations of using open-source software.) At the time, there was a long, long delay between whatever-I-did-to-corrupt-the-file and my discovery of the corruption. My bug report would have said something like, “I worked for three hours, saving regularly, and at the end of the three hours, I discovered that my file was corrupt.” Alas, that kind of bug report is probably worse than no bug report, as the best that a triager could do is say WORKSFORME.

I did go back through my notes, and it looks like oocalc was the original offender, and that I switched to gnumeric at least briefly. I didn’t see anything in my notes that gnumeric let me down. However, I don’t see any .gnumeric files in the directory, and I would think that if gnumeric was working smoothly for me, I would have left at least some .gnumeric files around.

Note, though, that I had these troubles in November 2006, which is about 56 dog-months ago. I would be surprised if they had made no progress since I had trouble, and (to be fair!), the spreadsheet that I worked on was a very challenging one.

07.16.07

robobait: how to enable coredump files

Posted in Hacking, robobait, Technology trends at 5:28 pm by ducky

How do you enable core dumps?

ulimit -c unlimited

(This was harder to find than it should have been, so I’m helping the world to find it.)

Keywords: core, coredump, enable, limit, unlimit, allow.

06.09.07

Excel dorkiness

Posted in Art, Hacking, Random thoughts at 11:44 pm by ducky

I stumbled across this old post by Anil Dash where he mentioned that almost all of his geeky friends have at some point made an Excel spreadsheet to keep track of something really obsessive:

Perhaps the ultimate example of this sort of dorkiness is the fact that almost every one of my friends has, at one point or another, made at least one Excel spreadsheet to document some arcane aspect of their lives. The number of consecutive sunny days, the types and prices of the cups of coffee they drink, or just straightforward charts about their boss’s mood. There’s no end to the ways one can misuse desktop applications in one’s personal life.

I read that and thought, “Huh. I certainly haven’t done anything like that.”

Um. But then I remembered that I had generated a list of the world’s writing systems, with the likeliest start/stop usage dates, the lat/long of where it was first used, how many people currently use it, who created it (if known), and samples of characters in that system (if I could find them, and I usually could). Oh.

And then my husband pointed out that I also have enumerated various California prisons, their lat/long, the type of facility (state pen, federal pen, county jail, etc.), and how many inmates it has. Oh.

But I can honestly say that I have never used Excel to keep track of these obsessions.

I used gnumeric and oocalc.

05.26.07

Open-sourcing code

Posted in Hacking, Maps at 5:49 pm by ducky

I just open-sourced the code for Mapeteria. If any of you are PHP4 gods, I have a few questions

05.21.07

hobby project blues

Posted in Hacking at 10:09 pm by ducky

When I started Mapeteria, it seemed like it would be pretty simple. It turned out to be much more complicated that I had originally envisioned. I wasn’t terribly surprised (I have, after all, spent a long time in industry), but it was a bit annoying.

  • Installation hell. It didn’t have to be that bad, but I didn’t know about the xampp project.
  • Geographic data. I needed boundary information for countries and states/provinces.
    • It was easy to find US data, but it was detailed/complex enough that it was really slow. Fortuitously, I saw Kevin Khaw’s state information, and he let me use it.
    • Because Mapeteria makes KML that people could use anywhere, I wanted to be quite certain that I had the right to use the boundary data. I found boundary data for Canada, but it wasn’t absolutely clear to me that I had the right to redistribute it. (Unlike in the US, the Canadian government retains the copyright to governmentally-produced information like maps.) I decided it would be faster to just trace out points on Google Maps and use that for the boundaries. (That also gave me control over the complexity of the polygons.)
    • I found country data relatively quickly, but it was complex enough that it was extremely slow to render on Google Maps. I was able to simplify the polygons pretty easily (by modifying a script by John Coryat that he’d adapted from Stephen Lime). Unfortunately, there a zillion little islands in that data, which make it much more complex than it needs to be. I believe that I will have to go remove all the islands by hand, yuck. 😛
    • I stumbled upon boundary information for France, thanks to Alexandre Dubois (aka Zakapatul). Because I’d already done simplification for the countries, it was not hard to simplify France, but I still had to do it. I also had to strip a bunch of stuff out of the KML file that I didn’t need (like shields representing each departement).
  • Bugs in Other People’s Code.
    • I never did get the debugger to work right in PHPEclipse, and I didn’t even have a good idea for how to troubleshoot it. So I just had no debugger. 😛 Echo statements (like printfs) were my friends.
    • There is a bug in Google Maps such that polygons that straddle the 180 E/W line are just broken. This makes sense — they are inherently ambiguous. However, Siberia, Fiji, and a Russian island straddle the line, alas.
    • There is a bug in Google’s maps that I found when I was tracing the Canadian outlines. It wasn’t a big deal, but I spent non-zero time on it.
  • Politics. What if somebody submits a data file with information for the USSR? Or Yugoslavia in 1970? Or Ethiopia in 1950? Or East Germany? I only have boundary information for 2006, not for all possible boundaries for all time.
  • Documentation.
    • What do I call states/provinces/territories/départments? I spent a fair amount of time trying to figure that out. I could call them “first level administrative divisions”, but the only people who know what that mean are map geeks. I could call them states or provinces or territories, but then I’d tick someone off. I never did figure out what to call them, so I call them states/provinces/territories/departments. 🙁
    • What do I call the two-letter, uh two-number, uh two-character codes for states/provinces/territories/départements?
    • How much detail do I give? How much is too much?
  • Testing. In addition to unit tests, I was (for a period) trying to automate more global tests, but comparing a generated KML to a “golden” KML.  However, I kept changing what was “golden” — I would take out or simplify polygons, add some debugging information, change from space-separated points to newline-separated points (and back), such that it was a real pain to keep the tests consistent.  Eventually I gave up and just had some “eyeball” tests: does it look right?
  • Evangelism.
    • Who do I tell? How soon? Do I tell them about countries, even though there is still the bug in Google Maps? Even though countries display very slowly? Lots of time spend wondering about that.
  • Open Source. I decided to open-source the code after I was basically done.
    • I needed to go through and make my code conform to PHP standards (like using_underscores instead of CamelCase), take out some of my hacks, clean up TODOs.
    • I needed to figure out where I was going to host the code. My own server? Sourceforge? Google? None were perfect, alas, so in addition to investigating, I had to do some agonizing, too, before settling on Google hosting.
    • I needed to transfer all my bugs from my private Bugzilla to the Google issue tracker.
    • I still need to transfer the code, which means installing a Subversion client and figuring out how to use it. It probably won’t take long, it’s something I should do anyways (like eating my vitamins), but it’s One More Thing.

So anyway, it always takes longer than you think it should; I decide to document why this time. 🙂

comparative programming linguistics

Posted in Hacking, programmer productivity, Technology trends at 8:26 pm by ducky

I have seen a lot of discussion over the years of the relative strengths (or weaknesses) of specific languages. People talk about why they use this language or that language, and they seem to always talk about specific linguistic constructs of their favored language.

“Mine has generics!” “Mine has macro expansion!” “Mine has closures!”

Sometimes the various devotees will give a nod to the richness of their language’s libraries, or to the robustness of their compiler, but rarely.

Recently, I’ve been working on a hobby project in PHP while reading up on tools like odb, JML, Daikon, Esc/Java2, javaspider, and EmmaECL. The contrast is stark.

PHP is stable enough, and seems to have plenty of libraries, but PHPEclipse quite downrev compared to Eclipse, the debugger doesn’t work at all for me (and I don’t now where to start troubleshooting), and there are essentially no additional tools. I feel like a starving dieter writing reviews for a gourmet food magazine: shackled to PHP and pining for the abundance of the Java tools.

Java’s advantages in the tool arena aren’t accidental.

  • Its static typing and no pointers makes a lot of tools easier to write.
  • Having no pointers makes it easier to teach, so undergraduate classes are now usually taught in Java, which means that the grad students tend to use Java when they research new tools.
  • The Eclipse IDE, being both open source and supported by IBM, makes it a great platform for tool development.

I am just about ready to swear fealty to Java, purely because of the richness of the third-party programming toolset.

05.20.07

software tools: JML and Daikon

Posted in Hacking, programmer productivity at 12:40 pm by ducky

From the name, I had thought that the Java Modeling Language (JML) was going to be some specialized variant of UML. I haven’t worked with UML, but what I see other people doing with it is drawing pictures.

Instead, JML turns out to be a specification for putting assertions about the code’s behaviour in comments. In that way, it is much more similar to Javadoc than to UML.

With both Javadoc and JML, the author of a method makes promises about what the method will do and what the method requires. In Javadoc, for example, @return int is a promise that the method will return an int, and @param foo int says that the method needs a parameter foo of type int.

With Javadoc, however, the promises are pretty minimal and the requirements are all stated elsewhere (like in the method definition). With JML, the promises about post-conditions and requirements on the pre-conditions can be much more elaborate. The programmer can promise that after the method finishes, for example:

  • a particular instance variable will be less than and/or greater than some value
  • an output sequence will be sorted in ascending order, or
  • variable foo will be bigger than when the method started.

The programmer can also state very detailed input requirements, like that an instance variable can’t be null, that an input must be in a certain range, or that the sum of foo and bar must be less than baz.

This rigorous definition of pre- and post-conditions is useful for documentation. The next programmer doesn’t have to read through the entire method to figure out that foo can’t be null, for example.

Additionally, the JML spec is rigorous enough that it can be used for with a variety of interesting and useful tools. With a special compiler (jmlc), pre- and post-condition checks can get compiled in to the code. Thus if someone calls a method with a parameter outside the allowed bounds, the code can assert an error. (The assertions can also be turned off for production code if so desired.)

But wait, there’s more! The specs are rigorous enough that a lot of checking can be done at compile time. If method A() promises that its output will be greater than 3, and method B() says that it requires the output to be greater than 5, then B(A()) would give a warning: A can give output (between 3 and 5) that B would gag on. See ESC/Java2.

But wait, there’s more! The JML annotations can be used to create unit and tests. The jmlunit writes tests that wiggle the input parameters over the legal inputs to a method, and checks to see if the outputs are correct.

There’s the small problem that it’s a pain to write all the pre-and post-conditions. Fortunately, there’s a tool (Daikon) which will help you with that. Daikon watches as you run a program and sees what changes and what doesn’t change, and from that, generates promises/requirements. Note that those might not be correct. If there are bugs in your program or if that execution of your program didn’t hit all the corner cases, then those won’t be correct. However, it will give you a good start, and I find that it is easier to spot mistakes in somebody else’s stuff than it is to spot omissions in things that I did.

This is all way cool.

05.19.07

Mapeteria: user-generated thematic maps

Posted in Hacking, Maps at 8:08 pm by ducky

A year ago, while I was in the midst of working on my Census Maps mashup, my Green College colleague Jana came up to me with a question. “I have a table of data about heat pump emissions savings for each province, and I want to make a map that colors each province based on the savings for that province. What program should I use to do that?”

I thought about all the work that I’d done for the Census Maps mashup — learning the Google Maps API, digging up the shape files for census tract boundaries, downloading and learning how to use the shapelib libraries to process the shapefiles, downloading and learning how to use gd, reacquainting myself with C++, reacquainting myself with gdb, debugging, trying to figure out why certain census tracts looked strange, etc, and rendered her an authoritative response: “Use Photoshop”, I said.

I was really dismayed that I had to tell her to use a paint program. Why should she — a geographer — have to learn about vertices and alpha channels and statically loaded libraries? Why wasn’t there some service where she could feed in a spreadsheet file and get back a map?

Well, I finally got tired of waiting for Google to do it, so developed Mapeteria — my own service for users to generate their own thematic maps.

If you give Mapeteria a CSV file (which is one of the formats that any spreadsheet program will be delighted to save as) plus a little more information about how it should be displayed, it will give you back a map. You can either get a KML file (which you can look at in Google Earth) or a Google Maps mashup that shows the map directly in your web browser.

So Jana, here’s your map!

Emissions savings of heat pumps vs. natural gas

05.14.07

software tools: omniscient debugger

Posted in Hacking, programmer productivity at 9:41 am by ducky

Bil Lewis has made odb, an “omniscient debugger”. It saves every state change during the execution of a program, then lets you use the debugger to step forward through the execution or backwards. You don’t have to guess about why a variable has the value it does; you can quickly jump to the last place it was set.

This seems like an extremely useful way to do things! Bil Lewis wonders why more people aren’t using his better mousetrap: “(I don’t understand why. I’m clever, I’m friendly, I’m funny. Why don’t people go for my stuff? I dunnknow.) ”

I like it, but…

  • Its installation is a bit fragile. If you don’t know exactly what you are doing and do everything exactly right, there isn’t a whole lot of help to get you back on track. (If you are a Java stud and don’t make any mistakes, the installation is straightforward.)
  • It is not an IDE. It is a stand-alone tool. Yes, its web page says that it can be launched from inside Eclipse, but it sure looks like that just spawns it as a separate process: it doesn’t look like it plugs into the standard Eclipse debugger. That means that it has a separate learning curve and fewer features.
  • I worry that for programs larger than “toy” programs, the amount of data that it will have to collect will overwhelm my system’s resources. Will I be able to debug Eclipse, for example?

Now, if it were tightly integrated with Eclipse, like this internal C simulation tool that Cicso presented at EclipseCon, I would be All. Over. It. As it is, I’m probably going to use odb, but only occasionally.

Update 21 May 2007: There is an omniscient version of gdb! It’s called UndoDb.

Update 28 July 2008: There is a commercial Java omniscient debugger called Codeguide, and a research one called TOD (Trace-Oriented Debugger).  Also, Andrew Ko’s Designing the Whyline describes a user study of an omniscient debugger, although that fact that it is an omniscient debugger is kind of buried.

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »