05.18.06

not talking

Posted in Married life at 10:14 am by ducky

On Monday and Tuesday, my husband, my mother, and I drove from Bellingham, WA to Oakland, CA. Mom was going down for a party and was interested in more drivers; Jim and I were interested in cargo space in her minivan, as we are moving to California for the summer. (I have a summer job at Google that, incidentally, I am hugely excited about.)

It was interesting how we didn’t talk much. Partly we were tired. Mom wasn’t feeling well. I was tired from final project frenzy, packing our stuff, and then helping Mom pack the next day. Jim was tired from packing and packing.

Partly it was hard to hold a conversation. The fan and/or air conditioner was blasting most of the way — it was quite a warm day. Add in that Mom is hard of hearing, and three-way conversations were tough.

And with Jim and I, we just didn’t have much to talk about. We’ve been together for ten years, had just spent three days near each other practically 24 hrs/day, and there are only so many observations one can make about how the terrain changes from forest to savannah to scruff while going from British Columbia through Oregon to California.

I used to worry about running out of things to say to Jim, but it was okay not talking. It wasn’t that we were avoiding talking to each other — we weren’t mad at each other. We weren’t frustrated at trying to find common ground or common interest. Rather, we are so much a part of each other’s lives, so intimately tangled that there just aren’t that many things that the other doesn’t already know about.

About the only thing we did talk about at length was Webfoot Maps. Because I’m going to Google this summer, I can’t work on my maps. Google owns my brain this summer.

I’m fine with trading working on my maps for working at Google. However, I get approached once or twice a month by people who want me to make a Google Maps mashup for them. Given that my husband is pretty unscheduled so far this summer, it seems like it would be useful if I could do a mind-meld with him so that he can help these people with their maps.

Thus I’ll be working at Google and Jim will be doing Google mashups. Avoiding the conflict-of-interest will be annoying.

I am not worried that we’ll be able to do it, however. This is the husband that I made sign an NDA before I would tell him what I did at Interval Research, after all. Two weeks after the Google Calendar showed up in my Google Trusted Tester account, I finally mentioned to Jim, “it’s too bad that you’re not in the Google Trusted Tester program, because there’s something that I’ve been testing for two weeks that I think you’d really like.” He responded, “I am in the Google Trusted Tester program. You mean you’ve been testing GCal too?

So I’m sure we can keep a wall between our work, it’ll just mean we’ll have to find other things to talk about.

05.01.06

Consumer-level grid computing

Posted in Technology trends at 5:35 pm by ducky

I am imagining a world with consumer-level grid computing. For example, imagine a maps server where in order to look at a map of Fargo, ND, you have to also serve those maps to other people who want to see Fargo, ND.

BitTorrent has shown that people are willing to exchange some of their resources for something that they value. BitTorrent uses bandwidth resources and not compute resources, but I don’t see why you couldn’t set up an application that used some compute resources as well.

I think of this not just because I’m working on a class assignment on grid computing, but also because I have a resource-pig of an application, namely my thematic maps of U.S. Census Bureau information.

While I haven’t had a lot of time to devote to improving the performance, I figure that my server can only handle about 2000 users per day. (I’m only getting about 100 users per day, but flatter myself by believing that when the world discovers the maps, it will shoot up.)

My husband observes that rendering Wikipedia pages into HTML is another data- and compute-intensive job that could be distributed. What if looking at at some Wikipedia pages meant that you downloaded the raw format, rendered the page as HTML, and then served the HTML to other people who wanted to see it?

It might not be practical for Wikipedia: suppose I want to read about geoducks, and Bob is serving that Wikipedia page. Wikipedia has to send me enough information for me to find Bob. It might be that the ratio between the amount of work Wikipedia saves by having Bob serve it just isn’t worth the extra overhead to run a distributed application

Another possibility would be render farms for amateur animated movies.  The more cycles you donate, the more frames you render, and the more of the film you get to see!

I have to believe that someday we will see distributed consumer applications that use both computing and bandwidth resources.

04.08.06

Secure, personalized RSS

Posted in Technology trends at 8:58 pm by ducky

I keep waiting for secure, personalized RSS to take over business-to-customer correspondence.

With so much spam arriving and so many anti-spam tools being pressed into service, companies with legitimate business find it difficult to communicate with their customers. A lot of legitimate business email looks very similar to spam, and phishers work very hard to make their messages indistinguishable from legitimate email.

Furthermore, because of its fundamental architecture, it is very, very difficult to make messages secure and verifiable. Email is fundamentally a “push” architecture, where the message might go through a few servers before it gets to you. That makes it very difficult to tell who is really pushing the message to you.

There are also security concerns. Yes, there are schemes to encrypt messages, but you have to pass around and keep track of all the different keys you need; there are numerous possible points of failure.

RSS, despite people thinking of it as a “push” technology, is actually a “pull” technology under the cover. Your feed reader quietly goes and checks a site every once in a while, and only tells you when there is something new. RSS also works over HTTP, so can be done with secure HTTP.

The piece that’s missing is being able to tell who is connecting. I don’t know if any RSS readers have the ability to store and present a username/password pair to a feed source. I don’t know if there is support on the server side for keeping a record of who has seen what messages.

However, if/when the technology for both exists, then Wells Fargo could “send” me my mortgage bill by private, secure RSS. They would have to tell me (securely) what my personal RSS URL was, and I would have to enter my password information, but after that, all would be golden. My reader could check once per month at the time that my bill was ready, and Wells Fargo would see that I got the bill. Furthermore, if Wells Fargo saw that I did *not* pick up my bill electronically, they could then send it to me by snail mail.

No spam. No phishing. Security. Reliability. Non-repudiation. All good things.

03.21.06

eye trackers

Posted in Random thoughts, Technology trends at 10:46 pm by ducky

As I mentioned before, webcams have proliferated to the point where I’ve seen some built in to computers, and this makes me wonder if they could be used for eye trackers.

Apparently the best eye trackers bounce an infra-red beam off of your cornea, and use the location of the IR spot to tell where you are looking. Some hearsay that says that the IR can damage the eyes, so maybe I wouldn’t want IR pointing at my eyeballs the whole time I’m on my computer.

However, webcams are so cheap that heck, use two! I somehow think that being able to triangulate would help.

If you could do good passive eye tracking, there are cool things you could do, like depth-of-field. Imagine you’re playing a computer game, and the stuff in the distance is sharp when you are looking at it and blurry when you are looking at things in the foreground. How cool would that be?

03.18.06

Single Operation Multiple Data

Posted in Hacking, Maps at 5:40 pm by ducky

One of the most venerable types of parallel processing is called SIMD, for Single Instruction Multiple Data. In those types of computers, you would do the exact same thing on many different pieces of data (like add two, multiply by five, etc) at the same time. There are some problems that lend themselves to SIMD processing very well. Unfortunately, there are a huge number of problems that do not lend themselves well to SIMD. It’s rare that you want to process every piece of data exactly the same.

Google has done a really neat thing with their architecture and software tools. They have abstracted things such that it looks to the developer like they have a single operation multiple data machine, where an operation can be something relatively complicated.

For example, to create one of my map tiles, I determine the coordinates of the tile, retrieve information about the geometry, retrieve information about the demographics, and draw the tile. With Google tools, once I have a list of tile coordinates, I could send one group of worker-computers (A group) off to retrieve the geometry information and a second (B group) off to retrieve the demographic information. Another group (the C group) could then draw the tiles. (Each worker in the C group would use data from exactly one A worker and one B worker.)

The A and B tasks are pretty simple, and maybe could be done by an old-style SIMD computer, but C’s job is much too complex to do in a SIMD computer. What steps are performed depends entirely on what is in the data. For a tile out at sea, the C worker doesn’t need to draw anything. For a tile in the heart of Los Angeles, it has to draw lots and lots of little polygons. But at this level of abstraction, I can think of “draw the tile” as one operation.

Under the covers, Google is does a lot of work to make it look like everything is beautifully parallel. In reality, there probably aren’t as many workers as tiles, but the Google tools take care of dispatching jobs to workers until all the jobs are finished. To the developer, it all looks really clean and tidy.

There are way more problems that lend themselves to SOMD than to SIMD, so I think this approach has enormous potential.

03.14.06

Who are the maps for?

Posted in Maps, Random thoughts at 11:28 pm by ducky

As my maps approach something reasonable for public distribution, I’ve been talking to more people about them. People are starting to ask me, “Who do you think will use them? What do you think they will use them for?”

I’m not quite sure how to answer that. I imagine marketing people will be interested, though I have to believe that they already have this information.

Would researchers use it? Maybe for preliminary investigation, but I would hope they’d use ArcGIS for anything they want to publish. While the maps “look right” to me for most places I know about, there are a few places that don’t look right to me. ArcGIS is fundamentally better — they have many many more resources than I do to get things right.

The “value add” for my maps is not “better”, but “cheaper” and “more accessible”. Twelve-year old Katie isn’t going to buy a copy of ArcGIS for her social studies class, but maybe she could use my maps for a report on the racial demographics of Texas. The Southern Poverty Law Center probably isn’t going to buy ArcGIS, but might go create a list of links to prisons to help people understand how African-Americans are hugely overrepresented in U.S. jails. Maybe Frieda and Joe will look at it to figure out what neighborhoods in Chicago they’d like to live in.

But my hunch is that most of the “use” won’t be obviously useful. I have certainly spent an awful lot of time just wandering around in the maps, exploring the demographics of my native country. Was this productive?

My maps aren’t very good for giving me answers, but they have given me lots of questions. Why are there so few rural blacks in Florida, when there are so many just across the border in Georgia? Why are there so few Latinos in East Texas compared to West Texas? Why is the median age so low on so many Native reservations? Why are there so many vacant housing units in northern Michigan and Minnesota?

However, I feel like these are good questions to have. Maybe I can’t articulate why I feel like a richer person for having explored U.S. demographics, but I absolutely do.

And if Katie, and Frieda, and Joe, and the Southern Poverty Law Center also feel enriched, then I will feel like I have succeeded.

03.11.06

Reading tools

Posted in Technology trends at 3:01 pm by ducky

As a graduate student, I have to read a lot of papers. I read a lot of them online in order to save trees.

It’s a real pain. I periodically have to scroll the page, and I frequently lose my place when I do so. And if there are two columns — as academic papers usually do — then I have to scroll up to the top, then back down again.

Two column layout, staggered I want two things.

1. I want software that will recognize when a PDF is two-column, and munge it so that it displays twice as many pages, but aligned so that I can read the center column straight down. Then “down” will always mean “in the direction I’m reading”.

Why keep the full width of the page? Why not chop it in half? So that if there is something that spans both columns (like a figure), then I can still see it in its entirety. I might have to horizontal scroll, but I’m willing to live with that.

2. I want an eyetracker to help me keep track of where I am on the page.

Webcams are now cheap and widespread. Some computers have them built in now. I want my webcam to track where I am in the text and to put a yellow dot there. If my eyes move quickly in a direction that doesn’t follow the line of the text, don’t move the dot.

If the eye tracker is good enough, as I get close to the bottom of the screen, move the page up a bit for me (but do it in a predictable way!) so that I can keep reading. If the eye tracker is not that accurate, I’ll turn the page myself, but I want the placemark to persist for a little bit when I move the page, so that I can find where I was.

One way of doing this would be to have the coloring be very faint if my eye is moving around, but the longer my gaze stays on one spot, the more the color “burns” into the page. After a small amount of time — say two seconds — the color starts to fade.

Note that this colored dot does not have to be my cursor. If the eye tracking is good enough, then I could perhaps use the eye tracker as a cursor, using gaze position to give x-y coordinates. When I raise my eyebrows up, register a click down; when I drop them again, register a click down.

03.10.06

More snow

Posted in Canadian life, Random thoughts at 10:31 am by ducky

On Wednesday, I decided that I’d seen the last snow of the year.  It wasn’t actually snow — it was very small hail — but I decided that was good enough.

Yesterday (Thursday), there was a light snow that didn’t stick.  I decided that the hail from the previous day didn’t count.

Today, we woke up to big puffy fluffy snow that stuck.  So today I have to decide that snow only counts if it sticks.

03.08.06

Vancouver weather

Posted in Canadian life, Random thoughts at 7:51 pm by ducky

Today, as I walked to class, the rain turned to white stuff. I thought it was snow, but it was actually very small hail.

It was really miserable today — cold, wet, clammy, more cold, more wet, more clammy. The rooms I had my classes in were dry but cold, and my jeans took a long time to dry out as well. All day it rained, and pretty hard, too.

It reminded me a bit of Illinois, where I grew up and got my first two academic degrees. In late February, we’d have a streak of really nice weather, and everybody would start to get giddy at finally being done with winter. Then, after everybody took their winter clothes home over Spring Break, just as the daffodils would start to bloom, we’d get another snowfall. Everybody would get really depressed at the seemingly interminable winter.

This happened regularly enough over enough years that even as dense as I am, I learned that it always snows exactly once after Spring Break. So in late February, when everybody else would be dancing about, I’d still be casting suspicious gazes at the sky. I didn’t trust Mother Nature to keep up the good weather.

Sure enough, the week after Spring Break, it would snow.

While everybody else would be morose about the return of bad weather, I by contrast would be greatly relieved. Finally! The last snow! I could relax and look forward to the oncoming beautiful weather.

So today, even as I looked at the white precipitation in the palm of my glove, I was happy. I don’t know the Vancouver weather patterns, but I convinced myself that this was the “one last snow after Spring Break.”

I was relieved.

03.05.06

Just how many computers does Google have?

Posted in Technology trends at 12:50 pm by ducky

Just how many computers does Google have?

Around the time of the IPO, Tristan Louis estimated that Google had between 32K and 79K computers, based on Google’s IPO filing. He apparently looked at the $250M on “property and equipment, net” line and used that for equipment costs, and only set aside $50M for employee computers, routers, switches, etc.

Does this make sense? My initial reaction was that it was a bit low: you also have to count desks, bookshelves, photocopiers, conference tables, phones, microwaves, frying pans, and volleyball nets. By comparison, Adobe, which has similar knowledge workers and also owns very little real estate, but has no data centers, has about $100M in property and equipment for 5,734 people, or about $17K per person. I think Google should be a little bit higher, since their equipment is newer and hasn’t depreciated yet. In Google’s filing, they list about 1,900 employees, which at $20K per person, works out to $38M.

Then you need to add in the routers and racks and cables at the data center. According to a 2003 paper put out by Google, each rack has 80 computers, and has one 100-Mbps Ethernet switch and one or two gigabit uplinks. Each data center has at least one gigabit switch that connects all the racks together. In addition, each data center has at least one hardware load balancer, and possibly more. (The papers about the Google architecture are ambiguous.) But even adding all that up, it still seems like a small number of thousands of dollars per data center.

Thus, $200M for data center computers is probably a reasonable figure. What about the per-computer costs? Those might be a bit low. Tristan Lewis uses the figure given in the 2003 paper of $278K (2002 prices) for a rack, but that’s if you buy from a vendor. Google builds its own, and just might get a volume discount on components, so they should be able to save at least 10-30%.Each rack would then cost $194K to $250K. At $200M for computers, this works out to 800 to 1030 racks of 80 computers each, or 64,000 to 82,000 computers. That’s a lot.

How many do they have now? According to Google’s most recent filing, they have $961M in property and equipment and 5,680 full time employees, so they bought about $710M of new equipment for 3780 new employees. $20K of equipment per employee means around $75M of employee equipment. Heck, let’s round up and say that $110M was on non-data center equipment. That leaves $600M for new racks.

For the same amount of money per computer that they spent in 2003, Google could get faster and more powerful computers, or they could get more computers. While their papers indicate that CPU speed isn’t the most important thing to them, they do also mention that they would expect to see a big speedup from dual-core processors. Bigger disks also seem like they could also be a win, so maybe Google spent something similar to what they did last time. Let’s guess about $200K per rack. That means that since the IPO, they would have bought an additional 3000 racks, or about 240,000 *more* computers.

I thus count that they have 300,000 computers. At least.

Wow.

UPDATE: Someone asserted that the equipment figure was wrong — that Google had bought a lot of dark fibre, and that would chew up a lot of millions of dollars. It appears that most of the chatter about Google buying dark fibre was due to Google posting an ad looking for someone to negotiate purchasing dark fibre leases. I sort of don’t think that you buy fibre, due to the incredible messy right-of-way issues. I think you lease fibre, and leases show up in a different line item on the financial statement.

UPDATE2: I didn’t figure in depreciation.  That would bump the number of computers up significantly.

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »