Ken Church and James Hamilton have a blog posting of Diseconomies of Scale which doesn’t seem right to me.
They suggest that it is more expensive to run a data center than to buy a bunch of condos, put 48 servers in each, wall up the servers, rent out the condo, and use the servers as a distributed data farm.
Church and Hamilton figure that the capital cost of a set of condos is about half of the cost of a data center. However, they use a sales price of $100K for the condos, but according to a news release issued by the US National Association of Realtors, the median metro-area condo price in the US was $226,800 in 2007 Q2.
Perhaps you could get cheaper condos by going outside the major metro areas, but you might not be able to get as good connectivity outside the major metro areas. I don’t have a figure for the median condo price in cities of population between 50,000 and 100,000, but that would be an interesting number.
A rack of 48 servers draws 12kW (at 250W/server, his number). For a house wired for 110V, this means that the current draw would be 109A. That is approximately five times what a normal household outlet can carry, so you would need to do significant rewiring. I don’t have an estimate on that cost.
Furthermore, my construction-industry source says that most houses can’t stand that much extra load. He says that older houses generally have 75A fuseboxes; newer houses have 100A to them. He says that it is becoming more common for houses to get 200A fuseboxes, but only for the larger houses.
While it is possible to upgrade, “ the cost of [upgrading from a 100A fusebox to a 200A fusebox] could be as high as $2000.” I suspect that an upgrade of this size would also need approval from the condo board (and in Canada, where residential marijuana farms is somewhat common, suspicion from the local police department).
If, as Church and Hamilton suggest, you “wall in” the servers, then you will need to do some planning and construction to build the enclosure for the servers. I can’t imagine that would cost less than $1000, and probably more like $2000.
They also don’t figure in the cost to pay someone (salary+plane+rental car+hotel) to evaluate the propterties, bid on them, and buy them. I wouldn’t be surprised if this would add $10K to each property. While there are also costs associated with acquiring property for one large data center, those costs should be lower than for aquiring a thousand condo.
Bottom line: I believe the capital costs would be higher than they expect.
For their comparison, Church and Hamilton put 54,000 servers either in one data center or in 1125 condos. They calculate that the power cost for the data center option is $3.5M/year. By contrast, they figure that the condo power consumption is $10.6M/year, but is offset by $800/month in condo rental income at 80% occupancy, for $8.1M in income. I calculate that 800*1125*12*.8 is $8.6M, and will use that number, giving a net operating cost of $2M/year, or $1.5M/year less than a data center.
Given a model with 1125 condos, this works out to $111 per condo per month. Put another way, if they are off by $111/month, a data center is cheaper.
Implicit in their model is that nobody will ever have to lay hands on a server rack. If you have to send someone out to someone’s residence to reboot the rack, that will cost you. That will depend upon how far the condo is from the technician’s normal location, how easy it is to find the condo, how agreeable the tenant is to having someone enter their condo, etc. I believe this would be a number significantly larger than zero. You might be able to put something in the lease about how the tenant agrees to reboot the servers on request, but this inconvenience will come at some price. Either it will come in the form of reduced monthly rent or a per-incident payment to the tenant.
I believe that the $800/month rental income ($1000/month rent less $200/month condo dues) is overly optimistic.
The median rent for a metro-area apartment was $1324 in 2007, but that’s for all housing types, not just condos. The median rent nationally in 2007 was $665. If you want the metro income, you need to pay the metro purchase price of $227K; if you want to save money by going outside the metro areas, you won’t get the high rent.
Eyeballing a few cities on Craigslist, it looked to me like the metro rent for two-bedroom condos was about $800/month.
Church and Hamilton also didn’t account for
- management fees (~5-7% of gross rent per month, or $50-$70 on income of $1000)
- property taxes (~2%/year of purchase price, so $160/month if he finds condos for $100K)
- maintenance that isn’t covered by the condo association like paint, new carpeting, and the hole the tenant punched in the wall ($40/month)
- reduction in rent for part of the condo being walled off and/or inconvenience of rebooting the servers (~$50/month)
- Liability insurance. If a short in the servers burns the condo complex down, that’s bad.
- Internet connectivity
While they didn’t account for Internet connectivity in either the datacenter scenario or the condos scenario, this is an area wherethere seem to be large economies of scale. For example, a T3 (44.736 Mbit/s) seems to cost between $7,500 and $14K/month or between $167 and $312/Mbit/sec/month. A T1 (1.536 Mbit/s) seems to cost between $550 and $1200/month or between $358 and $781/Mbit/s/month. The T1 is thus about twice as expensive per byte as the T3. I don’t know how much connectivity 54,000 servers would need, but I expect it would be significant, and expect that it would be significantly more expensive in 1125 smaller lots.
I can imagine there some significant additional costs in time and/or money.
- Obstreperous condo boards. If the condo board passes a rule against having server farms in your unit, you’re screwed.
- Obstreperous zoning boards. If the city decides that the server farm is part of a business, they might get unhappy about it being in a building zoned residential.
- Criminal tenants. What’s to stop your tenants from busting into the server closet and stealing all the servers?
Church and Hamilton close their article by saying, “whenever we see a crazy idea even within a factor of two of what we are doing today, something is wrong”. I think they are correct, and that their analysis is overly simplistic and optimistic.
There’s a cool paper on a tool to do semi-automatic debugging: Triage: diagnosing production run failures at the user’s site. While Triage was designed to diagnose bugs at a customer site (where the software developers don’t have access to either the configuration or the data), I think a similar tool would be very valuable even for debugging in-house.
They use a number of different techniques to debug C++ code.
- Checkpoint the code at a number of steps.
- Attempt to reproduce the bug. This tells whether it is deterministic or not.
- Analyzes the memory by walking the heap and stack to find possible corruptions.
- Roll back to previous checkpoints and rerun, looking for buffer overflows, dangling pointers, double frees, data races, semantic bugs, etc.
- Fuzz the inputs: intentionally vary the inputs, thread scheduling, memory layouts, signal delivery, and even control flows and memory states to narrow the conditions that trigger the failure for easy reproduction
- Compare the code paths from failing replays and non-failing replays to determine what code was involved in that failure.
- Generate a report. This gives information on the failure and a suggestion of which lines to look at to fix it.
They did a user study and found that programmers took 45% less time to debug when they used Triage than when they didn’t for “real” bugs, and 18% for “toy” bugs. (“…although Triage still helped, the effect was not as large since the toy bugs are very simple and straightforward to diagnose even without Triage.”)
It looks like the subjects were given the Triage bug reports before they started work, so the time that it takes to run Triage wasn’t factored into the time it took. The time it took Triage to run was significant (up to 64 min for one of the bugs), but presumably the Triage run would be done in background. I could set up Triage to run while I went to lunch, for example.
This looks cool.
I did a very quick, informal survey on how people use tabs when looking at Web search results. Some people immediately open all the search results that look interesting in new tabs, then explore them one by one (“open-parallel). Others open one result in a new tab, explore it, go back to the search page, then open the second result in another tab, etc. (“open-sequentially”). Note that the “open-sequential” people can have lots of tabs open at a time, they just open them one by one.
To clarify, open-parallel means control-clicking on the URL for result #1, then on result #2, then on #3, then on #4, and only THEN and switching to the tab for #1, examining it, switching to the tab for #2, etc. Open-sequential means control-clicking on the URL for result #1, switching to the tab for #1, examining #1, switching to the search results page, control-clicking on #2, switching to the tab for #2, examining #2, switching to the search results page, etc.
I was surprised to find that the people who had been in the US in the early 2000’s were far more likely to use the open-parallel strategy. There was an even stronger correlation with geekdom: all of the geeks used the open-parallel, and only two of the non-geeks did.
||Where in early 00’s?
||Where in early 00’s?
||Working/studying in US
||Working/studying in Canada
||Working in US
||Studying in US
||Studying in US
||Studying in Canada
||Studying/working in US
||Studying in US
||Working in Australia(?)
||Working in Europe(?)
||Working in US
||University in Canada
||Working in Canada
||University in Canada
||University in US
||University in US
Notes on the survey:
- The subject pool is not representative of the general propulation: everyone who answered lives or lived at my former dorm at UBC, has a bachelor’s degree, and all but one have an advanced degree or are working on one.
- I classified people as geeks if they had had Linux on at least one of their computers and/or had worked in the IT industry. The person with the “sort-of” in the geek column doesn’t qualify on any of those counts, but was a minor Internet celebrity in the mid 90s.
What does this mean?
What does this mean? I’m not sure, but I have a few ideas:
- I suspect that the geeks are more likely to have used a browser with modern tabbing behaviour much earlier, so have had more years to adapt their strategies. (Internet Explorer got modern tabbing behaviour in 2006; Mozilla/Firefox got it in 2001.)
- One of the benefits of the open-parallel strategy is that pages can load in the background. Maybe in 2001, Web access was slower enough that this was important and relevant. Maybe it’s not that the geeks have been using tabs longer, but that they started using tabs when the Internet was slow.
- It might be that the geeks do more Web surfing than the non-geeks, so have spent more time refining their Internet behaviour.
There are a zillion graphical IDEs out there, and I really don’t want to download and try each one. I don’t even want to try twenty of them. So, dear readers, care to help me?
All the IDEs that I’ve seen have a main center panel with the source of the file you’re looking at. Above that, they seem to all have a row of tabs, one per file that you’ve opened. (Does anybody have anything different?)
Here is a link to a screenshot of Eclipse. (Sorry, it was too wide for my blog.) Eclipse puts a little “>>” to show that there are more files (as pointed to by the ugly black arrow), with the number of hidden tabs next to it (“4” in this case). Clicking on the “>>4” gives a drop-down menu (shown here in yellow).
What happens in other IDEs when you have more tabs than will fit across the horizontal width of the source file? How does your IDE handle it? Or does your IDE have a different tabbing model altogether, e.g. it opens one window per file? I would greatly appreciate hearing from you.
You can either email me ducky at webfoot dot com or post in the comments; I’ll wait a few days and then summarize in another blog posting. Remember to tell me the name of your IDE.
I just read a preprint of a paper that talks about a feature that gives the user unprompted feedback on the user’s work. This reminded me of Clippy, which people absolutely hated.
Why did people hate Clippy so much? I think it was a status issue. Your computer — which presumably is low-status compared to you — was having the temerity to tell you what to do. We humans have a hard enough time receiving criticism from above us. I believe that criticism from below can infuriate people, especially if there is no way to punish the subordinate for the insubordination.
Larissa Tieden’s research says that people presume that high-status people do good things, and low-status people do bad things.
Thus, I think it would be good to design software that is not just user-friendly, but obsequious. Instead of “Error 39 — bad input”, it should say, “I’m sorry, I’m not smart enough to understand the input that you gave me.” Instead of “You would do better if you do X”, it should say, “Sorry to bother you, but I noticed that you are doing Y. You might find you have better luck if you do X.”
(And if you think people don’t treat computers like they do people, go read The Media Equation. Byron Reeves and Clifford Nass’ research is really fascinating.)
As I mentioned before, a long time ago, I posted my vision that university education would be split up into many pieces, that content delivery, accreditation, tutoring, etc would all be available separately.
Today I read an article about how MIT is celebrating getting at least part of the content of 1,800 courses online. Cooool.
UBC CS Prof. Rachel Pottinger‘s door has an article asking Why Women Become Veterinarians but Not Engineers. Fifty years ago, both were highly male-dominated fields. Today, women get about 3/4 of Vet Med degrees, while only about 1/5 of CS degrees. Maines doesn’t have an answer, but she does a good job of making the question interesting.
Right after I read Maines’ article, I read an article titled Is There Anything Good About Men by Roy F. Baumeister. It probably would have been better titled, “What are Men Good For?” The answer in the picture he paints is “taking risks.” He acknowledges that at the top end of the society, men dominate. To a very good first approximation, men are in charge. Presidents, CEOs, generals, Nobel Prize winners are usually men.
However, he points out that men are overrepresented at the both ends of society. He says that prisoners, the homeless. and people killed on the job (including soldiers) are also usually men. Interestingly, both the Nobel prize winners and the mentally retarded are more often male than female.
He goes on to develop his thesis more, but the basic idea is that men and women might have the same average ability at something, but that the distribution is usually much “fatter” for men than women. There are more men taking risks than women. Sometimes they succeed wildly; sometimes they fail wildly. Women hold down the middle ground, neither failing nor succeeding spectacularly.
Now go back to CS vs. Vet Med. I contend that CS has a much higher risk associated with it than Vet Med. If you don’t keep right on top of emerging computing technologies, it is really easy to get obsoleted in CS. The whole industry has changed several times in the past twenty years. Meanwhile, the architecture of the dog has not changed much in the past 200 years.
Even if you stay current with computing technologies, you aren’t guaranteed safe harbour during the high-tech world’s booms and busts. There is always the threat that someone else will release a product that will put you out of business, in part because the cost of distributing the product is so low. It is hard to imagine, however, how Microsoft could release a new product that would eliminate the need for someone to put antiseptic on Fido’s cut. The “distribution cost” of applying a bandage is very high.
The high-tech world is also more sensitive to fluctuations in consumer tastes and consumer confidence. While someone might delay buying an iPhone because they were nervous about their job getting cut, very few people euthanize their cat because money is tight.
It might be, then, that one way to make CS more attractive to women would be to make it less risky. Unfortunately, even though I have a pretty good imagination, I can’t think of how to make the high-tech world less risky.
As I work, I find myself asking a question over and over “how do I get information from point A to point B?”. For example, “How do I get the position of the active editor from EditorSashContainer into TabBehaviourAutoPin?”
It seems like there is probably room for a tool that figures out a path from point A to point B.
This might also make the code more maintainable — instead of me going and making new methods in the chain from A to B through six intermediate methods, maybe the tool can find a path that goes through only two classes.
I can imagine the tool giving me the shortest path through public methods, and (if it exists) the shortest path that requires private variables or methods. Maybe it would find that there is a private variable that, if I made a public getter, I could use.
UPDATE: my supervisor (“advisor” in the US) pointed me at a paper describing a system (Prospector) to do just that.
Alas, I haven’t been able to find any code for it. The code is at here, and there is also a web interface (which knows about J2SE 1.4, Eclipse 3.0, and Eclipse GEF source). The plugin also unfortunately seems to be too downrev for me. 🙁
There is also a tool called Strathcona which does something sort of similar — it finds examples of existing code that goes from A to B. I don’t think that would have helped me with the specific things I was looking for, because I don’t think there was any existing code anywhere that did all of what I wanted to do. It might have helped me get from A to C and then from C to A, however.
I’ve been working with an open-source visualization library called prefuse for a while. It’s used quite a bit, but mostly for graph visualization. I’m trying to use it for chart visualization. (Why? Because I also want to do graph visualization, and I figured — perhaps wrongly — that it would be better to learn the tao of one library well than two poorly.)
There are almost no examples out in the wild of how to do charts with prefuse. Here, then, is a link to ScatterPlotWithAxisLabels.java. Humans, you probably don’t care about this, this is just to let the robots find it.
It is a variation on the program ScatterPlot, but with axes labelled. You wouldn’t think that would be a big deal, but there are a lot of little things you have to get right, and with few examples, it is hard to know what you have specify and what is the default behaviour.
(Ooops, I wrote this a while ago and forgot to post it.)
In my recent post, Linux on the desktop, I mentioned that oocalc and/or gnumeric had let me down six months ago when I was working with an admittedly challenging spreadsheet. (It contained LOTS of obscure fonts from around the world.)
Within three days, I got a posting from one of the maintainers of gnumeric, asking me for more information. This is why Windows is doomed. I can’t imagine getting email from someone at Microsoft asking me for more information about a bug based on a posting in what is a pretty obscure blog.
Unfortunately, my problems were such that I couldn’t write a good bug report on it. (If I could have, I would have done so at the time. I consider writing bug reports one of the obligations of using open-source software.) At the time, there was a long, long delay between whatever-I-did-to-corrupt-the-file and my discovery of the corruption. My bug report would have said something like, “I worked for three hours, saving regularly, and at the end of the three hours, I discovered that my file was corrupt.” Alas, that kind of bug report is probably worse than no bug report, as the best that a triager could do is say WORKSFORME.
I did go back through my notes, and it looks like oocalc was the original offender, and that I switched to gnumeric at least briefly. I didn’t see anything in my notes that gnumeric let me down. However, I don’t see any .gnumeric files in the directory, and I would think that if gnumeric was working smoothly for me, I would have left at least some .gnumeric files around.
Note, though, that I had these troubles in November 2006, which is about 56 dog-months ago. I would be surprised if they had made no progress since I had trouble, and (to be fair!), the spreadsheet that I worked on was a very challenging one.
« Previous Page — « Previous entries « Previous Page · Next Page » Next entries » — Next Page »