Predictions for 2020

Posted in Technology trends at 1:46 am by ducky

Oh what the heck, since everybody else is doing it, here are my predictions for 2020:

  1. Essentially all cell phones will have built-in video cameras, GPS, and have voice controls.
  2. At least one country will nationalize music in some way, e.g. paying the music companies a per capita fee for their citizens every year.  Some countries will strike copyright laws for music.  Most just won’t bother enforcing copyright laws for music.
  3. Improved search + improved geo-location of social media streams will mean that it will be far easier to get information about your micro-neighbourhood.  Think Google Trends or Google Flu or Twitter Trends, but for the five mile radius of where you are right now.  (And, because of #1, you can get video.)
  4. Know where newspapers are right now, on the brink of death?  That’s where TV will be in 2020 — squeezed between on-demand entertainment and crowd-generated news.
  5. The cancer five-year survival rate cure rate will be 90% for most cancers, and 40% for the most difficult ones (bone, brain, pancreas, and liver).  Treatment will, unfortunately, still majorly suck for most patients.
  6. Mapping will extend to reconstruction of scenes based on user photos (like what Microsoft demonstrated at TED) in a big way.  By 2020, 100% of San Francisco’s publicly accessible spaces (yes, including alleys) will be mapped, and about 35% of interior spaces.  People at first will be quite upset that the world can “see into” their living room, but they will end up getting used to it.
  7. Marriage for same-sex couples will be recognized by the U.S. government.
  8. Know where newspapers were five years ago, sort of moseying down the path of death?  That’s where universities will be in 2020.  They will face pressure as superb educational content will become a commodity.  Third-party organizations will jump into the mix to provide tutoring and certification, leaving non-research universities with little to offer aside from post-teen socialization and sports.
  9. 30% of the world electricity energy production will be solar in 2020.  (It’s going to be one hell of a race between climate change and solar energy production, but I think solar energy will win.  All the climate-change deniers will say, “See!  Toldja so!”
  10. Data format description languages will overthrow XML. mean that data will get passed around in compact formats instead of in XML.  (Yes, the DFDL might be in XML, but the data wouldn’t be.)

Okay, I admit it, #10 might just be wishful thinking.

Update: At the time I wrote this, I had not read up on the Google Nexus One phone, which I now find out has voice commands for just about everything.  I guess prediction #1 about voice was under-optimistic!


Early days of the computer revolution

Posted in Hacking at 3:39 pm by ducky

I was talking to a friend of mine who I’m guessing was born in the late 1970s, and mentioned that I had been using computers since 1968ish and email since 1974.  (Yes, really.)

I saw a lightbulb go off over his head.  He knew I was in my mid-forties, and knew I was a computer geek, but hadn’t ever really put two and two together.  “Oh!  You were around for the start of the personal computer revolution?  What was that like?  That must have been totally *COOL*!!!

I said, “Not really.”

He was stunned.  How could it have possibly not been totally cool and awesome?  I could see him struggling with trying to figure out how to express his confusion, how to figure out what question to even ask.

I said, “Look.  You are old enough that you were around for the start of the mobile phone revolution, right?  That must have been totally *COOL*!!!

“Oh”, he said.  “I get it.”

Mobile phones, when they first started out were kind of cool, I guess, but they weren’t “magic” then: they took real effort and patience.  They were wickedly expensive, heavy, had lousy user interfaces, and you had to constantly worry about whether you had enough battery life for the call.  The reception was frequently (usually?) poor, so even if you could make a connection, your call frequently got dropped.  The signal quality was poor, so you had to TALK LOUDLY to be heard and really concentrate to understand the other person.  And they didn’t do much.  It took a long time for mobile phones to become “magic”, and they only got better incrementally.

When they first came out, personal computers were kind of cool, I guess, but they weren’t “magic” then: they took real effort and patience.  They were wickedly expensive.  They crashed frequently enough that you always had to worry about saving your work.  They had so little disk space that managing your storage was a constant struggle (and why floppies held on for so very long after the introduction of the hard drive).

When I started my first job out of college (working at a DRAM factory for Intel in 1984), they had only recently put in place two  data-entry clerks to input information about the materials (“lots”) as the lots traversed the manufacturing plant. (When did the lot arrive at a processing step?  When did it get processed?  Who processed it?  What were the settings and reading on the machine?)  However, to get at that information, engineers like me had to get a signature from higher level of management to authorize a request to MIS (which might get turned down!) for that information.

A few months after I started, they bought three IBM PC XTs and put them in a cramped little room for us engineers to use.  I believe the only programs on them were a word processor and a spreadsheet.  There was no storage available to us.  They had hard drives, but we were not allowed to leave anything on the hard drive; we had to take our work away on floppies.  The PCs were not networked, so not only was there no email (and of course no Web), but no way to access the data that was collected out on the manufacturing floor.

If I wanted to make a spreadsheet analyzing e.g. the relationship between measured thickness of the aluminum layer, the measured sputtering voltage, and how long it had been since the raw materials had been replenished, I would go to the factory floor, walk around to find different lots in different stages of processing, copy the information to a piece of paper, take the piece of paper and a floppy disk to the computer room (hoping that there was a free computer), copy the data from the piece of paper into the spreadsheet, print the spreadsheet (maybe making a graph, but that was a little advanced), and copy the data onto my floppy if I wanted to look at it again.

Even though we “had computers”, we had no network, no email, and no wiki.  The way I shared information was still:

  • make a bunch of photocopies and put them in people’s (physical) mailboxes,
  • make a bunch of photocopies and walk around putting them on people’s desks,
  • make a bunch of photocopies and pass them out at a meeting where I presented my results, or
  • make one photocopy, write a routing list on it (a list of names with checkboxes), and put it on the first person’s desk.  They would read it, check their name off, and pass it to someone else on the list.

At the next company I worked at (1985-7), I got a PC on my desk because I was implementing the materials tracking system (because I had bitched about how stupid it was not to have one — yeah!!).  Our company had a network, but it was expensive and complicated enough that my desktop computer wasn’t on the network, nor were the two machines on the floor.  Ethernet used a coax cable and (if I recall correctly) you had to make a physical connection by puncturing the cable just right.  The configuration was tricky and not very fault-tolerant: if one computer on the network was misconfigured, it would mess up the entire network AND it was difficult even for a skilled network technician to figure out which computer was misconfigured.

At the next company I worked for (1987), I had a Sun workstation on my desk, and we had a network file servers, but I don’t think we used email, even internally.

At my next company (1988-90), I had a Wyse 50 “glass teletype” on my desktop and full email capability.  The Wyse 50 didn’t have any graphics, but that wasn’t a real big deal because no programs I would ever want to use at work had graphics of any sort. My department used email heavily, but there were some departments in the company that did not use email, so there were lots of memos that were still issued on paper.  They would go either into my (physical) mailbox, or would be pinned to cubicle corridor walls.

While I had the theoretical ability to send email to the outside world then, almost nobody I knew outside the company had email, and figuring out how to address messages to get to the outside was difficult: you had to specify all the intermediate computers, e.g. sun!ubc!decwrl!decshr!slaney.

It wasn’t until that company imploded in about 1990 and my colleagues scattered to other computer companies that I had anyone outside my company to correspond with.  (Fortunately, at about the same time, it got easier to address external email messages.)

It wasn’t until 1991 that I stopped seeing paper memos — a full twenty fifteen years after the introduction of the Apple II computer.  My husband reports that he also stopped seeing paper memos in about 1991.

I would contend that computers didn’t really start to become “magic” until about 1996 or so, when the World Wide Web had been absorbed by the masses and various Web services were available.  Only after about 1996 could you pretty reliably assume that anyone (well, those born after WW2 started, at least) used computers or had an email address.

So when personal computers first came out, they were not totally cool.  The idea of personal computers was totally cool.  The potential was totally cool.  But that potential was unrealized for many many years.


Right-brain vs. left-brain: Sarah Palin

Posted in Politics at 1:48 pm by ducky

Sarah Palin is, you might have noticed, a very polarizing politician.  Liberals are absolutely flummoxed that anybody could like her.  Conservatives can’t understand why anyone wouldn’t like her.  I think that Sarah Palin shows up a fundamental difference in values between liberals and conservatives: conservatives value right-brain thinking and liberals don’t.

As I have posted before, Jonathan Haight found that liberals and conservatives place different weights on aspects of morality.  Liberals weight fairness much more highly than conservatives, for example, and conservatives weight what Haight calls “purity” much more highly than liberals.  “Purity” is IMHO a poor term for it: “gut instinct” is probably a better term.  It’s getting the feeling that something is wrong or right.  This is a right-brain function.

Our educational system works hard to get people to stop listening to their gut, to process with the logical, procedural, lingual left-brain side.  There are good pedagogical reasons for this: the right brain is fundamentally non-lingual, so it is difficult (if not impossible!) to explain right-brain decisions, to examine the decisions for errors in reasoning or assumptions, or to grade right-brain reasoning.  The right brain can only communicate its conclusions with feelings, with “gut instincts”.

The right-brain does not communicate its decisions well, but that doesn’t mean that its processing is invalid.  There are many things that the right brain can do that the left brain cannot.  You cannot derive a great song, deduce that your spouse loves you, or prove that that figure a block away is your cousin Chris.  People who make decisions only with the left-brain, only with facts and logic are more vulnerable to errors in the models or starting assumptions.  (One might argue that the entire mortgage meltdown came from an over-reliance on left-brain reasoning and paying inadequate attention to the little voices saying, “waitaminute — can this really work?”)

If you value right-brain processing, then the political climate for you must be very frustrating.  Liberals don’t even pay lip service to right-brain processing: it is so non-valued that it is a complete blind spot for them.  (If Obama was any more left-brain, he’d fall over.)   I can imagine that it would also be scary to see your beautiful country in the hands of people who apparently are paying no attention at all to their gut.

Sarah Palin is total right-brain.  Here is what she said when asked when Bill O’Rielly asked her if she was smart enough to be president:

I believe that I am because I have common sense, and I have, I believe, the values that are reflective of so many other American values. And I believe that what Americans are seeking is not the elitism, the kind of a spinelessness that perhaps is made up for that with some kind of elite Ivy League education and a fact resume that’s based on anything but hard work and private sector, free enterprise principles. Americans could be seeking something like that in positive change in their leadership. I’m not saying that has to be me.

Nothing in her answer has to do with left-brain facts or logic, and in fact she skewers left-brain training (“elite … education” and “a fact resume”).  She is also completely unapologetic about being right-brained; instead of being guilty and ashamed of it, she gets angry and frustrated at her critics.  This is a high-status behaviour, and people think that high-status people do good things, as I have posted about before.

Meanwhile, liberals look at her left-brain abilities and are appalled.  They find fault with her left-brain abilities, as evidenced by what they see as her rhetorical weaknesses: her inability to marshal facts into the type of coherent, rhetorically logical arguments that they favour.  They do not value her right-brain rhetorical abilities — her ability to reach people’s “guts” — because they do not value right-brain skills.  The conservatives are less bothered by her weakness in left-brain skills because they do not value left-brain skills as much.

The left also remembers G. W. Bush, who was also very right-brain, going on gut and instinct.  They think that his instincts were frequently wrong (e.g. Iraq did not have weapons of mass destruction) with disastrous consequences.  So to some extent, they are punishing Sarah Palin for what they saw as G. W. Bush’s mistakes.

Note: I have been somewhat loose with the terms “liberal” and “conservative” here.  While I think there are probably not very many right-brain liberals, there are left-brain conservatives.  Andrew Sullivan is clearly a left-brain conservative, and Sarah Palin clearly drives him absolutely nuts.

Australian maps

Posted in Maps at 1:27 am by ducky

My friend Maciek Chudek and I entered two maps into the Mashup Australia contest: Shades of a Sunburnt People and Stimulating a Sunburnt People. The former shows information about the 2006 Australian Census:

Median age

Median age

Redder areas have a higher median age; gold areas are younger.  (The red maxes out at 45 years old; any area with a median age of 25 or under is full gold.)  Grey areas are ones which had so few people that the Australian Bureau of Statistics withheld the data for privacy reasons.

Our other map shows information about the rail, roads, and community infrastructure component of the Australian economic stimulus package:

Australian stimulus program spending

Australian stimulus program spending

Blue areas are represented by the Australian Labor Party (which controls Parliament), and reddish areas are controlled by other parties.  The darker the colour, the more money has been allocated.  Dots represent individual projects.  Like the Canadian economic stimulus package, we found a systematic bias favouring areas represented by the governing party.

Since these are so similar to my US census map and the Canadian stimulus map, you might think that this was totally straightforward to do.  You might be wrong.  We did quite a bit of massaging the data to get it out, and Maciek did a lot of analysis of the stimulus information.


Canadian stimulus infrastructure leaving Québec out

Posted in Canadian life, Politics at 1:29 pm by ducky

Update: Some of the ridings were assigned to neighbouring ridings due to losing some precision in the input lat/lng.  This did not make a big difference in the overall picture, as only 2.7% of the projects were classified incorrectly.  I’ve updated this blog posting and the map; we probably won’t update the spreadsheet unless we have strong requests to do so.

There has been a fair amount of press lately on the distribution of Canadian stimulus money, with most of what we’d heard saying that Conservative ridings were getting more than their fair share of stimulus money e.g.  The Globe and Mail’s Stimulus Program favours Tory ridings.  Conservatives countered that it was important to look at the big picture, and that there were multiple stimulus programs. The National Post’s Liberal, NDP ridings getting more than fair share of infrastructure money: analysis reported that non-conservative ridings were getting more than their fair share of the Knowledge Infrastructure Program grants.

My husband and I kind of looked at each other and said, “We can analyze that data!”, so we did.  We saw a bias in Conservative/non-Conservative ridings, but it wasn’t huge.  We found that Conservative ridings got 51% of the projects, while only 46% of the ridings are Conservative.  NDP ridings got 15% of the projects, despite only having 12% of the ridings, and even liberals got slightly more than “their fair share”, with 27% of the projects and only 25% of the ridings.

So who is getting less?  The Bloc Québécois.  With 15% of the ridings, the Bloc only got 6% of the projects.

If you look at the breakdown by province, it looks like Ontario is getting way, WAY more than its fair share, with some other provinces — especially Québec — getting less than their fair share.

Province % of projects % of population
AB 6.71 11
BC 9.2 13
MB 3.81 3.6
NB 3.05 2.2
NF 3.33 1.5
NS 4.20 1.8
NT 0.55 .13
NU 0.39 .094
ON 53.0 39
PE 1.7 .42
QC 8.2 23
SK 5.26 3.1
YT 0.53 0.1

Dollar values are much harder to estimate because the value of each of the projects is given as a range — “under $100K”, “between $100K and $1M”, etc.  We made our best guesses at how to calculate that, and our best estimate gives Québec 12% of the dollars for 23% of the population — better, but still way less than they should be getting.

Now, there might be some errors in the data, as described below.  However, we think that this is worth investigation, and soon.

If you would like to look at the data yourself, Jim put together a spreadsheet in Open Office format, a slightly less-powerful spreadsheet in Excel format, and a PDF showing information from the spreadsheet, available on his writeup of the data. I of course made a map of the data.


  • We are in no way affiliated with the Government of Canda or Statistics Canada.  This analysis does not represent government policy.
  • There are multiple parts to the stimulus package, and this analysis only covers the infrastructure component.  Other money in the stimulus plan is going towards improving the financial system (which I think means “bank bailouts”, but I’m not sure), extending unemployment benefits, etc.
  • I assigned ridings based on the latitudes and longitudes that were given in the Economic Action Plan’s data (which Jim pulled down using their API).  We have some doubts about the integrity of those lat/long pairs, especially since two of the stimulus projects have lat/longs that are unquestionably in the United States.
  • My software truncated the latitude and longitude used to assign ridings down to two digits, which means the points can appear a little bit to the east and/or south of their actual location.  In cases where a point is very near a riding border, that could mean that it would be assigned to the neighbouring riding.  I expect this would only affect a very few ridings, and would not significantly affect the by-province countsThis made absolutely no difference in the riding assignment. Update: 174 ridings were mis-assigned, but this did not make a huge difference in the aggregate numbers.  It meant that Conservative ridings got 51.1% of the projects instead of 52%, and Quebec got 8.2% instead of 8.6%.  The message stays the same: there is inequity here.
  • It might be that some national projects are assigned a lat/lng in Ontario because there wasn’t an obvious locus for the project.  However, if that were the case, then I would expect that Ottawa would have the biggest number of projects.  In fact, the most projects (144) are in the Vaughn riding in Toronto, represented by MP Hon. Maurizio Bevilacqua (L).  The Ottawa Centre riding does have the second-highest, but only 101 projects out of the 6424 projects — not enough to explain why QC has so few projects.


  • Information on the stimulus project information came from Canada’s Economic Action Plan
  • Canadian federal riding geometries came from Elections Canada, which requires this notice: © The federal Electoral Districts Boundaries (Representation Order 2003), Elections Canada. All rights reserved. Reproduced with the permission of Elections Canada, Ottawa, Ontario K1A 0M6 Canada (2007).
  • Information on Canadian MPs came from the House of Commons Web site and were produced by the Government of Canada.  The right to reproduce for non-commercial use is given here.
  • Provincial population figures came from Statistics Canada.

Update: I looked at per-capita figures, and that makes things look even more skewed. Five of the top ten ridings in projects per hundred thousand people are in Ontario:

Kenora, ON 132.211
Algoma—Manitoulin—Kapuskasing, ON 124.42
Yukon, YT 111.94
Parry Sound—Muskoka, ON 101.90
Ottawa Centre, ON 92.37
Egmont, PEI 91.91
Vaughan, ON 91.44
Nunavut, NU 84.82
Western Arctic, NT 84.41
Labrador, NL 79.65

The top riding in Quebec, by comparison, ranks 82th out of 308 ridings (at 29.9 projects per hundred thousand people).

(Note that ridings have roughly between 25 and 125 thousand people, with the average right around one hundred thousand.)


US State legislators’ affiliations

Posted in Maps at 11:34 am by ducky

I have two more political layers up on my political/demographic map: US state senators and US state representatives (or assemblymembers, as they are called in some states).  Alaska and Hawaii didn’t fit nicely on these images, but you can see them on the political/demographic map.

In the pictures below (and on my site), red is Republican; blue is Democratic.  (To those outside the US who are used to red meaning liberal, the US does its colours backwards, sorry.) Some districts elect multiple members; in those cases I average the colour, with exact Democratic/GOP balance being white.  In cases where there is a vacancy or a third-party affiliation, the colour is also white.

Here are the state senators:

US State Senator Party Affiliation

Continental US State Senator Party Affiliation

Below is the party affiliation of the lower chamber members (which are usually called Representatives, but also sometimes Assemblymembers or Delegates).  Note that Nebraska doesn’t have a lower chamber.

US State Lower Chamber Members' Party Affiliation

Continental US State Lower Chamber Members' Party Affiliation

Most of the party affiliation data came from the excellent Project Vote Smart.  What they didn’t have, I gleaned from the appropriate state legislature’s page, Wikipedia, or both.

For comparison, the images below show all the districts in the continental US in random colours:

Continental US State Senate Districts

Continental US State Senate Districts

Continental US State Lower Chamber Districts

Continental US State Lower Chamber Districts

There are almost 8000 state and federal legislators in the US for a population of 300M people, or about one legislator per 375,000 people.   The number of legislators varies wildly by state, however.  New Hampshire currently has 424 state and federal legislators representing a population of 1.3M, or one legislator for every 3066 people.  California currently has 176 representing a population of 36M, or one legislator for every 204,000 people.


Median household income map done

Posted in Maps at 7:11 pm by ducky

I just added the median household income to a demographics map, and my oh my you see so much more at the census tract level than you do at the county level.  (I recommend making it a bit more opaque to help you see better.)

The map makes me think of mosquito bites: cities have a white center (low-income), surrounded by an angry red ring (the wealthy suburbs), with white again out in the rural areas:


In this image, full white is a median household income of $30,000 per year (in 1999 dollars), while full red is $150,000.  Grey is for areas that the Census Bureau didn’t report a median income for — presumably because too few people lived there.  The data is from the 2000 census.


Terribly sorry..

Posted in Technology trends at 11:23 am by ducky

I’m really sorry, but I moved from http://webfoot.com/blog to blog.webfoot.com, and the users are (hopefully only temporarily) lost.  I’ll work on it, but it might be a little while.

Okay, I think users are back up.  Let me know.


Africa journal part 3: people

Posted in Travel at 7:36 pm by ducky

(See also part 1 and part 2 of my Africa postings.)

It’s sort of cliche to say, “the thing I liked best about county X was the people”.  We said that about the people in Quebec City, for example.  The people in Botswana, however, take it to a whole different level.

There is a concept in southern Africa whose name in Zulu you might be familiar with: ubuntu, or in Setswana botho.  It’s not just an operating system, it’s a philosophy of life, almost a religion.  The word encapsulates generosity, warmth, openness, and acceptance, but also an acknowledgment of the interconnectedness of all people.

Botho/ubuntu is highly valued in southern Africa; in North America, not so much.  I’m not saying that North Americans think that warmth, generosity, openness, and acceptance are bad things, just that they are not valued as highly as other things — like getting the job done quickly and cheaply.

One concrete manifestation of this is that all transactions start with an inquiry about the other: Hello/hello/how are you/fine, how are you/fine, thank you.  While this happens in North America, too, you by and large don’t say it to strangers.  In North America, when it is your turn to get served, you just say what you want: “One ticket to District 9, please.”  Not so in Botswana: all transactions get the full greeting.

Furthermore, while I can’t prove it, I think the Batswana (people of Botswana) mean it.

Valuing botho leads Batswana to be nice, but also it seems like they feel it is their patriotic duty to make sure that tourists have a good time.  Botswana has three basic sources of foreign currency: diamonds, cattle, and tourism.  That’s pretty much it, and everybody knows that.  Everybody understands that diamonds, cattle, and tourism are what funds their roads, their health system, their universities, etc.  Thus it is important to the health of their country that tourists keep coming back.

Botswana is also a very small country: 1.7 million people in a country roughly the size of Texas (which has 24 million people).  In addition to connections being important (see botho above), everybody knows everybody.  (We stopped at a random fast-food place in Francistown at one point, where our Motswana friend B. had lived for about four years, eight years ago.  Of the ten or so people in the restaurant, B. knew three.  At the hotel we stayed at in Kasane, B. knew one of the desk clerks.)  Even if a Motswana doesn’t work in tourism, somebody they know will work in tourism: their brother / sister-in-law / cousin / cousin’s husband’s nephew, somebody.

Finally, Botswana is a politically stable, relatively prosperous country.  Botho doesn’t get sacrificed to sectarian violence, nor to hunger.  I didn’t see any beggars in Botswana, and the only street hawker we saw had come over from Zimbabwe.  (South Africa and Zimbabwe are not prosperous countries, and we did see street hawkers in both of those countries.)  There was zero reason to fear getting beaten for my political beliefs, and I never worried about getting mugged.

I was profoundly affected by how nice the people in Botswana were.  It was a little bit jarring to come back to the USA, where I wasn’t really supposed to ask gate clerks how they were, and certainly wasn’t supposed to care.  I find myself much more wary in North America, with our high number of beggars.

I think that Canadians value botho a bit more than people in the US.  So oddly, I think that going to Africa made me a little bit more Canadian!


Progress! Including census tracts!

Posted in Hacking, Maps at 9:54 pm by ducky

It might not look like I have done much with my maps in a while, but I have been doing quite a lot behind the scenes.

Census Tracts

I am thrilled to say that I now have demographic data at the census tract level now on my electoral map!  Unlike my old demographic maps (e.g. my old racial demographics map), the code is fast enough that I don’t have to cache the overlay images.  This means that I can allow people to zoom all the way out if they choose, while before I only let people zoom back to zoom level 5 (so you could only see about 1/4 of the continental US at once).

These speed improvements were not easy, and it’s still not super-fast, but it is acceptable.  It takes between 5-30 seconds to show a thematic map for 65,323 census tracts. (If you think that is slow, go to Geocommons, the only other site I’ve found to serve similarly complex maps on-the-fly.  They take about 40 seconds to show a thematic map for the 3,143 counties.)

A number of people have suggested that I could make things faster by aggregating the data — show things per-state when way zoomed out, then switch to per-county when closer in, then per-census tract when zoomed in even more.  I think that sacrifices too much.  Take, for example, these two slices of a demographic map of the percent of the population that is black.  The %black by county is on the left, the %black by census tract is on the right.  The redder an area is, the higher the percentage of black people is.

Percent of population that is black; by counties on left, by census tracts on the right

Percent of population that is black; by counties on left, by census tracts on the right

You’ll notice that the map on the right makes it much clearer just how segregated black communities are outside of the “black belt” in the South.  It’s not just that black folks are a significant percentage of the population in a few Northern counties, they are only significantly present in tiny little parts of Northern counties.  That’s visible even at zoom level 4 (which is the zoom level that my electoral map opens on).  Aggregating the data to the state level would be even more misleading.


Something else that you wouldn’t notice is that my site is now more buzzword-compliant!  When I started, I hard-coded the information layers that I wanted: what the name of the attribute was in the database (e.g. whitePop), what the English-language description was (e.g. “% White”), what colour mapping to use, and what min/max numeric values to use.  I now have all that information in an XML file on the server, and my client code calls to the server to get the information for the various layers with AJAX techniques.  It is thus really easy for me to insert a new layer into a map or even to create a new map with different layers on it.  (For example, I have dithered about making a map that shows only the unemployment rate by county, for each of the past twelve months.)

Some time ago, I also added the ability for me to specify how to calcualte a new number with two different attributes.  Before, if I wanted to plot something like %white, I had to add a column to the database of (white population / total population) and map that.  Instead, I added the ability to do divisions on-the-fly.   Subtracting two attributes was also obviously useful for things like the difference in unemployment from year to year. While I don’t ever add two attributes together yet, I can see that I might want to, like to show the percentage of people who are either Evangelical or Morman.  (If you come up with an idea for how multiplying two attributes might be useful, please let me know.)

Loading Data

Something else that isn’t obvious is that I have developed some tools to make it much easier for me to load attribute data.  I now use a data definition file to spell out the mapping between fields in an input data file and where the data should go in the database.  This makes it much faster for me to add data.

The process still isn’t completely turnkey, alas, because there are a million-six different oddnesses in the data.  Here are some of the issues that I’ve faced with data that makes it non-straightforward:

  • Sometimes the data is ambiguous.  For example, there are a number of states that have two jurisdictions with the same name.  For example, the census records separately a region that has Bedford City, VA and Bedford County, VA.  Both are frequently just named “Bedford” in databases, so I have to go through by hand and figure out which Bedford it is and assign the right code to it.  (And sometimes when the code is assigned, it is wrong.)
  • Electoral results are reported by county everywhere except Alaska, where they are reported by state House district.  That meant that I had to copy the county shapes to a US federal electoral districts database, then delete all the Alaskan polygons, load up the state House district polygons, and copy those to the US federal electoral districts database.
  • I spent some time trying to reverse-engineer the (undocumented) Census Bureau site so that I could automate downloading Census Bureau data.  No luck so far.  (If you can help, please let me know!)  This means that I have to go through an annoyingly manual process to download census tract attributes.
  • Federal congressional districts have names like “CA-32” and “IL-7”, and the databases reflect that.  I thought I’d just use the state jurisdiction ID (the FIPS code, for mapping geeks) for two digits and two digits for the district ID, so CA-32 would turn into 0632 and IL-7 would turn into 1707.  Unfortunately, if a state has a small enough population, they only get one congressional rep; the data file had entries like “AK-At large” which not only messed up my parsing, but raised the question of whether at-large congresspeople should be district 0 or district 1.  I scratched my head and decided assign 0 to at-large districts.  (So AK-At large became 0200.)  Well, I found out later that data files seem to assign at-large districts the number 1, so I had to redo it.

None of these data issues are hard problems, they are just annoying and mean that I have to do some hand-tweaking of the process for almost every new jurisdiction type or attribute.  It also takes time just to load the data up to my database server.

I am really excited to get the on-the-fly census tract maps working.  I’ve been wanting it for about three years, and working on it off and on (mostly off) for about six months.  It really closes a chapter for me.

Now there is one more quickie mapping application that I want to do, and then I plan to dive into adding Canadian information.  If you know of good Canadian data that I can use freely, please let me know.  (And yes, I already know about GeoGratis.)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »