05.21.06

to-do manager

Posted in Technology trends at 8:44 pm by ducky

Last term, I wrote a to-do list manager in Scheme for a class project. I have been frustrated with all the to-do list managers out in the world, so I finally wrote my own.

(OSAF‘s Chandler might have some of the features that I want, but it’s heavierweight than I need and isn’t really ready for prime time yet.)

The problem with most electronic to-do list managers is that they accumulate way too much stuff. I write down all the things I need to do, and then I am overwhelmed by the volume of tasks. Tasks that aren’t important or urgent or that I can’t work on right now end up getting in the way of seeing what I need to do right now.

I want to record that I need to take the car in for servicing in about three months, but I don’t want to see it until about three months from now. I want to write down that I need to paint the house, buy paint, buy rollers, and move furniture, but I don’t need to see the “move furniture” task until after I’ve bought the paint and rollers.

todo list manager screenshot

In my to-do list manager, tasks are presented in a hierarchical structure, and I gave myself three different methods for hiding things:

  1. Hide completed tasks
  2. Hide supertasks (i.e. those that I can’t start on until some other task is finished)
  3. Hide deferred tasks (until some later date (specified on a per-task basis))

(I of course also have the option to show completed tasks, show deferred tasks, or show supertasks.)

If I hide completed, deferred, and supertasks, then what is left are the things that I can work on right now.

Note that there is no option to mark supertasks “done”. My Scheme version also doesn’t let you defer tasks with dependencies, but I haven’t decided if I am going to keep that or not.

I had thought that it would be nice to have separate importance and urgency fields, as those really are different things. Answering a ringing phone is very urgent but probably not very important (voicemail will pick it up); writing a will is very important but (hopefully!) not very urgent.

It turned out, however, that I didn’t really miss having distinct urgency and importance fields; if an item wasn’t urgent, I just deferred it. Presto, out of sight, out of mind.

One thing that I hadn’t originally planned on, but which I did and liked, was to change the color of tasks based on how important they were rated. Very important tasks were deep blue, and as the tasks got less important, they got more and more washed out (less saturatated).

My Scheme version has a text box on the main page for quickly creating new tasks, but there was a cruical flaw: no way to specify which task was the parent task. I thought about showing an arbitrary number next to each task in the list of tasks, and using that to specify the parent, but that didn’t seem appropriate. Really what you need to be able to do is drag-and-drop tasks to different places in the list.

One thing that I thought of doing but didn’t get around to was to be able to expand/collapse branches of the tree. Thus if I just don’t feel like working on upgrading the family IT infrastructure today, I can collapse that task (and all its subtasks) down to one line.

I’ve started porting my to-do list manager, making it an AJAX application so that I can host it on my site at Dreamhost, but I’m not going to finish before I start at Google, alas.

Someday I’d like to integrate it with a calendar. (The Google calendar has an API; maybe I could connect to the calendar.) Someday I’d like to add optional at-location and with-person fields, so that I could ask what tasks I can do at e.g. the hardware store (like “buy paint”), or with e.g. Jim. But that will probably have to wait until after the summer is over.

05.01.06

Consumer-level grid computing

Posted in Technology trends at 5:35 pm by ducky

I am imagining a world with consumer-level grid computing. For example, imagine a maps server where in order to look at a map of Fargo, ND, you have to also serve those maps to other people who want to see Fargo, ND.

BitTorrent has shown that people are willing to exchange some of their resources for something that they value. BitTorrent uses bandwidth resources and not compute resources, but I don’t see why you couldn’t set up an application that used some compute resources as well.

I think of this not just because I’m working on a class assignment on grid computing, but also because I have a resource-pig of an application, namely my thematic maps of U.S. Census Bureau information.

While I haven’t had a lot of time to devote to improving the performance, I figure that my server can only handle about 2000 users per day. (I’m only getting about 100 users per day, but flatter myself by believing that when the world discovers the maps, it will shoot up.)

My husband observes that rendering Wikipedia pages into HTML is another data- and compute-intensive job that could be distributed. What if looking at at some Wikipedia pages meant that you downloaded the raw format, rendered the page as HTML, and then served the HTML to other people who wanted to see it?

It might not be practical for Wikipedia: suppose I want to read about geoducks, and Bob is serving that Wikipedia page. Wikipedia has to send me enough information for me to find Bob. It might be that the ratio between the amount of work Wikipedia saves by having Bob serve it just isn’t worth the extra overhead to run a distributed application

Another possibility would be render farms for amateur animated movies.  The more cycles you donate, the more frames you render, and the more of the film you get to see!

I have to believe that someday we will see distributed consumer applications that use both computing and bandwidth resources.

04.08.06

Secure, personalized RSS

Posted in Technology trends at 8:58 pm by ducky

I keep waiting for secure, personalized RSS to take over business-to-customer correspondence.

With so much spam arriving and so many anti-spam tools being pressed into service, companies with legitimate business find it difficult to communicate with their customers. A lot of legitimate business email looks very similar to spam, and phishers work very hard to make their messages indistinguishable from legitimate email.

Furthermore, because of its fundamental architecture, it is very, very difficult to make messages secure and verifiable. Email is fundamentally a “push” architecture, where the message might go through a few servers before it gets to you. That makes it very difficult to tell who is really pushing the message to you.

There are also security concerns. Yes, there are schemes to encrypt messages, but you have to pass around and keep track of all the different keys you need; there are numerous possible points of failure.

RSS, despite people thinking of it as a “push” technology, is actually a “pull” technology under the cover. Your feed reader quietly goes and checks a site every once in a while, and only tells you when there is something new. RSS also works over HTTP, so can be done with secure HTTP.

The piece that’s missing is being able to tell who is connecting. I don’t know if any RSS readers have the ability to store and present a username/password pair to a feed source. I don’t know if there is support on the server side for keeping a record of who has seen what messages.

However, if/when the technology for both exists, then Wells Fargo could “send” me my mortgage bill by private, secure RSS. They would have to tell me (securely) what my personal RSS URL was, and I would have to enter my password information, but after that, all would be golden. My reader could check once per month at the time that my bill was ready, and Wells Fargo would see that I got the bill. Furthermore, if Wells Fargo saw that I did *not* pick up my bill electronically, they could then send it to me by snail mail.

No spam. No phishing. Security. Reliability. Non-repudiation. All good things.

03.21.06

eye trackers

Posted in Random thoughts, Technology trends at 10:46 pm by ducky

As I mentioned before, webcams have proliferated to the point where I’ve seen some built in to computers, and this makes me wonder if they could be used for eye trackers.

Apparently the best eye trackers bounce an infra-red beam off of your cornea, and use the location of the IR spot to tell where you are looking. Some hearsay that says that the IR can damage the eyes, so maybe I wouldn’t want IR pointing at my eyeballs the whole time I’m on my computer.

However, webcams are so cheap that heck, use two! I somehow think that being able to triangulate would help.

If you could do good passive eye tracking, there are cool things you could do, like depth-of-field. Imagine you’re playing a computer game, and the stuff in the distance is sharp when you are looking at it and blurry when you are looking at things in the foreground. How cool would that be?

03.11.06

Reading tools

Posted in Technology trends at 3:01 pm by ducky

As a graduate student, I have to read a lot of papers. I read a lot of them online in order to save trees.

It’s a real pain. I periodically have to scroll the page, and I frequently lose my place when I do so. And if there are two columns — as academic papers usually do — then I have to scroll up to the top, then back down again.

Two column layout, staggered I want two things.

1. I want software that will recognize when a PDF is two-column, and munge it so that it displays twice as many pages, but aligned so that I can read the center column straight down. Then “down” will always mean “in the direction I’m reading”.

Why keep the full width of the page? Why not chop it in half? So that if there is something that spans both columns (like a figure), then I can still see it in its entirety. I might have to horizontal scroll, but I’m willing to live with that.

2. I want an eyetracker to help me keep track of where I am on the page.

Webcams are now cheap and widespread. Some computers have them built in now. I want my webcam to track where I am in the text and to put a yellow dot there. If my eyes move quickly in a direction that doesn’t follow the line of the text, don’t move the dot.

If the eye tracker is good enough, as I get close to the bottom of the screen, move the page up a bit for me (but do it in a predictable way!) so that I can keep reading. If the eye tracker is not that accurate, I’ll turn the page myself, but I want the placemark to persist for a little bit when I move the page, so that I can find where I was.

One way of doing this would be to have the coloring be very faint if my eye is moving around, but the longer my gaze stays on one spot, the more the color “burns” into the page. After a small amount of time — say two seconds — the color starts to fade.

Note that this colored dot does not have to be my cursor. If the eye tracking is good enough, then I could perhaps use the eye tracker as a cursor, using gaze position to give x-y coordinates. When I raise my eyebrows up, register a click down; when I drop them again, register a click down.

03.05.06

Just how many computers does Google have?

Posted in Technology trends at 12:50 pm by ducky

Just how many computers does Google have?

Around the time of the IPO, Tristan Louis estimated that Google had between 32K and 79K computers, based on Google’s IPO filing. He apparently looked at the $250M on “property and equipment, net” line and used that for equipment costs, and only set aside $50M for employee computers, routers, switches, etc.

Does this make sense? My initial reaction was that it was a bit low: you also have to count desks, bookshelves, photocopiers, conference tables, phones, microwaves, frying pans, and volleyball nets. By comparison, Adobe, which has similar knowledge workers and also owns very little real estate, but has no data centers, has about $100M in property and equipment for 5,734 people, or about $17K per person. I think Google should be a little bit higher, since their equipment is newer and hasn’t depreciated yet. In Google’s filing, they list about 1,900 employees, which at $20K per person, works out to $38M.

Then you need to add in the routers and racks and cables at the data center. According to a 2003 paper put out by Google, each rack has 80 computers, and has one 100-Mbps Ethernet switch and one or two gigabit uplinks. Each data center has at least one gigabit switch that connects all the racks together. In addition, each data center has at least one hardware load balancer, and possibly more. (The papers about the Google architecture are ambiguous.) But even adding all that up, it still seems like a small number of thousands of dollars per data center.

Thus, $200M for data center computers is probably a reasonable figure. What about the per-computer costs? Those might be a bit low. Tristan Lewis uses the figure given in the 2003 paper of $278K (2002 prices) for a rack, but that’s if you buy from a vendor. Google builds its own, and just might get a volume discount on components, so they should be able to save at least 10-30%.Each rack would then cost $194K to $250K. At $200M for computers, this works out to 800 to 1030 racks of 80 computers each, or 64,000 to 82,000 computers. That’s a lot.

How many do they have now? According to Google’s most recent filing, they have $961M in property and equipment and 5,680 full time employees, so they bought about $710M of new equipment for 3780 new employees. $20K of equipment per employee means around $75M of employee equipment. Heck, let’s round up and say that $110M was on non-data center equipment. That leaves $600M for new racks.

For the same amount of money per computer that they spent in 2003, Google could get faster and more powerful computers, or they could get more computers. While their papers indicate that CPU speed isn’t the most important thing to them, they do also mention that they would expect to see a big speedup from dual-core processors. Bigger disks also seem like they could also be a win, so maybe Google spent something similar to what they did last time. Let’s guess about $200K per rack. That means that since the IPO, they would have bought an additional 3000 racks, or about 240,000 *more* computers.

I thus count that they have 300,000 computers. At least.

Wow.

UPDATE: Someone asserted that the equipment figure was wrong — that Google had bought a lot of dark fibre, and that would chew up a lot of millions of dollars. It appears that most of the chatter about Google buying dark fibre was due to Google posting an ad looking for someone to negotiate purchasing dark fibre leases. I sort of don’t think that you buy fibre, due to the incredible messy right-of-way issues. I think you lease fibre, and leases show up in a different line item on the financial statement.

UPDATE2: I didn’t figure in depreciation.  That would bump the number of computers up significantly.

02.23.06

More advice to Google about maps

Posted in Maps, Technology trends at 11:06 pm by ducky

Because all the data associated with Google Maps goes through Google, they can keep track of that information. If they wanted to, they could store enough information to tell you what the most map markers within two miles of 1212 W. Springfield, Urbana, Illinois were. Maybe one would be from Joe’s Favorite Bars mashup and maybe one would be from the Museums of the World mashup. Maybe fifty would show buildings on the university of Illinois campus from the official UIUC mashup, and maybe two would be from Josie’s History of Computing mashup.

Google could of course then use that mashup data in their location-sensitive queries, so if I asked for “history computing urbana il”, they would give me Josie’s links instead of returning the Urbana Free Library. (They would need to be careful in how they did this in a way that didn’t tromp on Josie, if they want to stick to their “Don’t be evil” motto.)

This is another argument for why they should recognize a vested interest in making it easy for developers to add their own area-based data. If Google allows people to easily put up information about specific polygons, then Google can search those polygons. Right now, because I had to do my maps as overlays, Google can’t pull any information out of them.

If Google makes polygons and their corresponding data easy to name, identify, and access, they will be able to do very powerful things in the future.

Addendum: I haven’t reverse-engineered the Google Maps javascript — I realized that it’s quite possible that the marker overlays are all done on the client side.  (Desirable and likely, in fact.)  In that case, they wouldn’t have the data.  However, it would be trivial to insert some code to send information about the markers up to the server.  Would that be evil?  I’m not sure.

02.17.06

Disaster maps

Posted in Hacking, Maps, Technology trends at 2:27 pm by ducky

I was in San Jose when the 1989 Loma Prieta earthquake hit, and I remember that nobody knew what was going on for several days. I have an idea for how to disseminate information better in a disaster, leveraging the power of the Internet and the masses.

I envision a set of maps associated with a disaster: ones for the status of phone, water, natural gas, electricity, sewer, current safety risks, etc. For example, where the phones are working just fine, the phone map shows green. Where the phone system is up, but the lines are overloaded, the phone map shows yellow. Where the phones are completely dead, the phone map shows red. Where the electricity is out, the power map shows red.

To make a report, someone with knowledge — let’s call her Betsy — would go to the disaster site, click on a location, and see a very simple pop-up form asking about phone, water, gas, electricity, etc. She would fill in what she knows about that location, and submit. That information would go to several sets of servers (geographically distributed so that they won’t all go out simultaneously), which would stuff the update in their databases. That information would be used to update the maps: a dot would appear at the location Betsy reported.

How does Betsy connect to the Internet, if there’s a disaster?

  1. She can move herself out of the disaster area. (Many disasters are highly localized.) Perhaps she was downtown, where the phones were out, and then rode her bicycle home, to where everything was fine. She could report on both downtown and her home. Or maybe Betsy is a pilot and overflew the affected area.
  2. She could be some place unaffected, but take a message from someone in the disaster area. Sometimes there is intermittent communication available, even in a disaster area. After the earthquake, our phone was up but had a busy signal due to so many people calling out. What you are supposed to do in that situation is to make one phone call to someone out of state, and have them contact everybody else. So I would phone Betsy, give her the information, and have her report the information.
  3. Internet service, because of its very nature, can be very robust. I’ve heard of occasions where people couldn’t use the phones, but could use the Internet.

One obvious concern is about spam or vandalism. I think Wikipedia has shown that with the right tools, community involvement can keep spam and vandalism at a minimum. There would need to be a way for people to question a report and have that reflected in the map. For example, the dot for the report might become more transparent the more people questioned it.

The disaster site could have many more things on it, depending upon the type of disaster: aerial photographs, geology/hydrology maps, information about locations to get help, information about locations to volunteer help, topology maps (useful in floods), etc.

What would be needed to pull this off?

  • At least two servers, preferably at least three, that are geographically separated.
  • A big honkin’ database that can be synchronized between the servers.
  • Presentation servers, which work at displaying the information. There could be a Google Maps version, a Yahoo Maps version, a Microsoft version, etc.
  • A way for the database servers and the presentation servers to talk to each other.
  • Some sort of governance structure. Somebody is going to have to make decisions about what information is appropriate for that disaster. (Hydrology maps might not be useful in a fire.) Somebody is going to have to be in communication with the presentation servers to coordinate presenting the information. Somebody is going to have to make final decisions on vandalism. This governance structure could be somebody like the International Red Cross or something like the Wikimedia Foundation.
  • Buy-in from various institutions to publicize the site in the event of a disaster. There’s no point in the site existing if nobody knows about it, but if Google, Yahoo, MSN, and AOL all put links to the site when a disaster hit, that would be excellent.

I almost did this project for an MS thesis project, but decided against it, so I’m posting the idea here in the hopes that someone could run with it. I don’t foresee having the time myself.

02.16.06

Advice to Google about maps and data

Posted in Maps, Technology trends at 10:40 pm by ducky

I have been working on a Google maps mashup that has been a lot of work. While I might be able to get some benefit from investing more time and energy in this, I kept thinking to myself, “Google could do this so much better themselves if they wanted to. They’ve got the API, they’ve got the bandwidth, they’ve got the computational horsepower.”

Here’s what I’d love to see Google do:

  1. Make area-based mashups easier. Put polygon generation in the API. Let me feed you XML of the polygon vertices, the data values, and what color mapping I want, and draw it for me. (Note that with version 2 of the API, it might work to use SVG for this. I have to look into that.)
  2. Make the polygons first-class objects in a separate layer with identities that can feed back into other forms easily. Let me roll over a tract and get its census ID. Let me click on a polygon and pop up a marker with all the census information for that tract.
  3. Make it easy to combine data from multiple sources. Let me feed you XML of census tract IDs, data values, and color mapping, and tell you that I want to use census tract polygon information (or county polygons, or voting precinct polygons, or …) from some other site, and draw it for me.
  4. Host polygon information on Google. Let me indicate that I want census tract polygons and draw them for me.
  5. Provide information visualization tools. Let me indicate that I want to see population density in one map, percent white in another, median income in a third, and housing vacancy rates in a fourth, and synchronize them all together. (I actually had a view like that working, but it is computationally expensive enough that I worry about making it available.) Let me do color maps in two or three dimensions, e.g. hue and opacity.
  6. Start hosting massive databases. Start with the Census data, then continue on to the Bureau of Labor Statistics, CIA factbook information, USGS maps, state and federal budgets, and voting records. Sure, the information is out there already, but it’s in different formats in different places. Google is one of the few places that has the resources to bring them all together. They could make it easy for me to hook that data easily into information visualization tools.
  7. Get information from other countries. (This is actually tricky: sometimes governments copyright and charge money for their official data.)

Wouldn’t it be cool to be able to show an animation of the price of bread divided by the median income over a map of Europe from ten years before World War II to ten years after?

So how would Google make any money from this? The most obvious way would be to put ads on the sites that display the data.

A friend of mine pointed out that Google could also charge for the data in the same way that they currently charge for videos on Google Video. Google could either charge the visualization producers, who would then need to extract money from their consumers somehow, or they could charge the consumers of the visualizations.

Who would pay for this information? Politicians. Marketers. Disaster management preparedness organizations. Municipal governments. Historians. Economists. The parents of seventh-graders who desperately need to finish their book report. Lots of people.

02.10.06

Microsoft should fear Google, part 3

Posted in Email, Technology trends at 10:06 pm by ducky

I thought Google should start competing with Outlook; now they are doing it.

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »