03.21.06

eye trackers

Posted in Random thoughts, Technology trends at 10:46 pm by ducky

As I mentioned before, webcams have proliferated to the point where I’ve seen some built in to computers, and this makes me wonder if they could be used for eye trackers.

Apparently the best eye trackers bounce an infra-red beam off of your cornea, and use the location of the IR spot to tell where you are looking. Some hearsay that says that the IR can damage the eyes, so maybe I wouldn’t want IR pointing at my eyeballs the whole time I’m on my computer.

However, webcams are so cheap that heck, use two! I somehow think that being able to triangulate would help.

If you could do good passive eye tracking, there are cool things you could do, like depth-of-field. Imagine you’re playing a computer game, and the stuff in the distance is sharp when you are looking at it and blurry when you are looking at things in the foreground. How cool would that be?

03.18.06

Single Operation Multiple Data

Posted in Hacking, Maps at 5:40 pm by ducky

One of the most venerable types of parallel processing is called SIMD, for Single Instruction Multiple Data. In those types of computers, you would do the exact same thing on many different pieces of data (like add two, multiply by five, etc) at the same time. There are some problems that lend themselves to SIMD processing very well. Unfortunately, there are a huge number of problems that do not lend themselves well to SIMD. It’s rare that you want to process every piece of data exactly the same.

Google has done a really neat thing with their architecture and software tools. They have abstracted things such that it looks to the developer like they have a single operation multiple data machine, where an operation can be something relatively complicated.

For example, to create one of my map tiles, I determine the coordinates of the tile, retrieve information about the geometry, retrieve information about the demographics, and draw the tile. With Google tools, once I have a list of tile coordinates, I could send one group of worker-computers (A group) off to retrieve the geometry information and a second (B group) off to retrieve the demographic information. Another group (the C group) could then draw the tiles. (Each worker in the C group would use data from exactly one A worker and one B worker.)

The A and B tasks are pretty simple, and maybe could be done by an old-style SIMD computer, but C’s job is much too complex to do in a SIMD computer. What steps are performed depends entirely on what is in the data. For a tile out at sea, the C worker doesn’t need to draw anything. For a tile in the heart of Los Angeles, it has to draw lots and lots of little polygons. But at this level of abstraction, I can think of “draw the tile” as one operation.

Under the covers, Google is does a lot of work to make it look like everything is beautifully parallel. In reality, there probably aren’t as many workers as tiles, but the Google tools take care of dispatching jobs to workers until all the jobs are finished. To the developer, it all looks really clean and tidy.

There are way more problems that lend themselves to SOMD than to SIMD, so I think this approach has enormous potential.

03.14.06

Who are the maps for?

Posted in Maps, Random thoughts at 11:28 pm by ducky

As my maps approach something reasonable for public distribution, I’ve been talking to more people about them. People are starting to ask me, “Who do you think will use them? What do you think they will use them for?”

I’m not quite sure how to answer that. I imagine marketing people will be interested, though I have to believe that they already have this information.

Would researchers use it? Maybe for preliminary investigation, but I would hope they’d use ArcGIS for anything they want to publish. While the maps “look right” to me for most places I know about, there are a few places that don’t look right to me. ArcGIS is fundamentally better — they have many many more resources than I do to get things right.

The “value add” for my maps is not “better”, but “cheaper” and “more accessible”. Twelve-year old Katie isn’t going to buy a copy of ArcGIS for her social studies class, but maybe she could use my maps for a report on the racial demographics of Texas. The Southern Poverty Law Center probably isn’t going to buy ArcGIS, but might go create a list of links to prisons to help people understand how African-Americans are hugely overrepresented in U.S. jails. Maybe Frieda and Joe will look at it to figure out what neighborhoods in Chicago they’d like to live in.

But my hunch is that most of the “use” won’t be obviously useful. I have certainly spent an awful lot of time just wandering around in the maps, exploring the demographics of my native country. Was this productive?

My maps aren’t very good for giving me answers, but they have given me lots of questions. Why are there so few rural blacks in Florida, when there are so many just across the border in Georgia? Why are there so few Latinos in East Texas compared to West Texas? Why is the median age so low on so many Native reservations? Why are there so many vacant housing units in northern Michigan and Minnesota?

However, I feel like these are good questions to have. Maybe I can’t articulate why I feel like a richer person for having explored U.S. demographics, but I absolutely do.

And if Katie, and Frieda, and Joe, and the Southern Poverty Law Center also feel enriched, then I will feel like I have succeeded.

03.11.06

Reading tools

Posted in Technology trends at 3:01 pm by ducky

As a graduate student, I have to read a lot of papers. I read a lot of them online in order to save trees.

It’s a real pain. I periodically have to scroll the page, and I frequently lose my place when I do so. And if there are two columns — as academic papers usually do — then I have to scroll up to the top, then back down again.

Two column layout, staggered I want two things.

1. I want software that will recognize when a PDF is two-column, and munge it so that it displays twice as many pages, but aligned so that I can read the center column straight down. Then “down” will always mean “in the direction I’m reading”.

Why keep the full width of the page? Why not chop it in half? So that if there is something that spans both columns (like a figure), then I can still see it in its entirety. I might have to horizontal scroll, but I’m willing to live with that.

2. I want an eyetracker to help me keep track of where I am on the page.

Webcams are now cheap and widespread. Some computers have them built in now. I want my webcam to track where I am in the text and to put a yellow dot there. If my eyes move quickly in a direction that doesn’t follow the line of the text, don’t move the dot.

If the eye tracker is good enough, as I get close to the bottom of the screen, move the page up a bit for me (but do it in a predictable way!) so that I can keep reading. If the eye tracker is not that accurate, I’ll turn the page myself, but I want the placemark to persist for a little bit when I move the page, so that I can find where I was.

One way of doing this would be to have the coloring be very faint if my eye is moving around, but the longer my gaze stays on one spot, the more the color “burns” into the page. After a small amount of time — say two seconds — the color starts to fade.

Note that this colored dot does not have to be my cursor. If the eye tracking is good enough, then I could perhaps use the eye tracker as a cursor, using gaze position to give x-y coordinates. When I raise my eyebrows up, register a click down; when I drop them again, register a click down.

03.10.06

More snow

Posted in Canadian life, Random thoughts at 10:31 am by ducky

On Wednesday, I decided that I’d seen the last snow of the year.  It wasn’t actually snow — it was very small hail — but I decided that was good enough.

Yesterday (Thursday), there was a light snow that didn’t stick.  I decided that the hail from the previous day didn’t count.

Today, we woke up to big puffy fluffy snow that stuck.  So today I have to decide that snow only counts if it sticks.

03.08.06

Vancouver weather

Posted in Canadian life, Random thoughts at 7:51 pm by ducky

Today, as I walked to class, the rain turned to white stuff. I thought it was snow, but it was actually very small hail.

It was really miserable today — cold, wet, clammy, more cold, more wet, more clammy. The rooms I had my classes in were dry but cold, and my jeans took a long time to dry out as well. All day it rained, and pretty hard, too.

It reminded me a bit of Illinois, where I grew up and got my first two academic degrees. In late February, we’d have a streak of really nice weather, and everybody would start to get giddy at finally being done with winter. Then, after everybody took their winter clothes home over Spring Break, just as the daffodils would start to bloom, we’d get another snowfall. Everybody would get really depressed at the seemingly interminable winter.

This happened regularly enough over enough years that even as dense as I am, I learned that it always snows exactly once after Spring Break. So in late February, when everybody else would be dancing about, I’d still be casting suspicious gazes at the sky. I didn’t trust Mother Nature to keep up the good weather.

Sure enough, the week after Spring Break, it would snow.

While everybody else would be morose about the return of bad weather, I by contrast would be greatly relieved. Finally! The last snow! I could relax and look forward to the oncoming beautiful weather.

So today, even as I looked at the white precipitation in the palm of my glove, I was happy. I don’t know the Vancouver weather patterns, but I convinced myself that this was the “one last snow after Spring Break.”

I was relieved.

03.05.06

Just how many computers does Google have?

Posted in Technology trends at 12:50 pm by ducky

Just how many computers does Google have?

Around the time of the IPO, Tristan Louis estimated that Google had between 32K and 79K computers, based on Google’s IPO filing. He apparently looked at the $250M on “property and equipment, net” line and used that for equipment costs, and only set aside $50M for employee computers, routers, switches, etc.

Does this make sense? My initial reaction was that it was a bit low: you also have to count desks, bookshelves, photocopiers, conference tables, phones, microwaves, frying pans, and volleyball nets. By comparison, Adobe, which has similar knowledge workers and also owns very little real estate, but has no data centers, has about $100M in property and equipment for 5,734 people, or about $17K per person. I think Google should be a little bit higher, since their equipment is newer and hasn’t depreciated yet. In Google’s filing, they list about 1,900 employees, which at $20K per person, works out to $38M.

Then you need to add in the routers and racks and cables at the data center. According to a 2003 paper put out by Google, each rack has 80 computers, and has one 100-Mbps Ethernet switch and one or two gigabit uplinks. Each data center has at least one gigabit switch that connects all the racks together. In addition, each data center has at least one hardware load balancer, and possibly more. (The papers about the Google architecture are ambiguous.) But even adding all that up, it still seems like a small number of thousands of dollars per data center.

Thus, $200M for data center computers is probably a reasonable figure. What about the per-computer costs? Those might be a bit low. Tristan Lewis uses the figure given in the 2003 paper of $278K (2002 prices) for a rack, but that’s if you buy from a vendor. Google builds its own, and just might get a volume discount on components, so they should be able to save at least 10-30%.Each rack would then cost $194K to $250K. At $200M for computers, this works out to 800 to 1030 racks of 80 computers each, or 64,000 to 82,000 computers. That’s a lot.

How many do they have now? According to Google’s most recent filing, they have $961M in property and equipment and 5,680 full time employees, so they bought about $710M of new equipment for 3780 new employees. $20K of equipment per employee means around $75M of employee equipment. Heck, let’s round up and say that $110M was on non-data center equipment. That leaves $600M for new racks.

For the same amount of money per computer that they spent in 2003, Google could get faster and more powerful computers, or they could get more computers. While their papers indicate that CPU speed isn’t the most important thing to them, they do also mention that they would expect to see a big speedup from dual-core processors. Bigger disks also seem like they could also be a win, so maybe Google spent something similar to what they did last time. Let’s guess about $200K per rack. That means that since the IPO, they would have bought an additional 3000 racks, or about 240,000 *more* computers.

I thus count that they have 300,000 computers. At least.

Wow.

UPDATE: Someone asserted that the equipment figure was wrong — that Google had bought a lot of dark fibre, and that would chew up a lot of millions of dollars. It appears that most of the chatter about Google buying dark fibre was due to Google posting an ad looking for someone to negotiate purchasing dark fibre leases. I sort of don’t think that you buy fibre, due to the incredible messy right-of-way issues. I think you lease fibre, and leases show up in a different line item on the financial statement.

UPDATE2: I didn’t figure in depreciation.  That would bump the number of computers up significantly.