Jim Glanz, writing in the New York Times this past Sunday, described existing inefficiencies in Internet infrastructure, but omitted important context that can help interested readers really understand the problem. The article, in which I’m quoted, is Glanz, James. 2012. “Power, Pollution, and the Internet.” New York Times. New York, NY. September 23. p. A1. A related “Room for Debate” section (in which I have an article) went online on Monday September 24, 2012.
The article conflates different types of data centers, and in the process creates a misleading impression for readers who are not familiar with this industry. I like to divide the industry into four kinds of data centers: public cloud computing providers (like Amazon, Google, Facebook, and Microsoft), scientific computing centers (like those at national laboratories and universities), co-location facilities (which house servers owned by other companies), and what I call “in-house” data centers (which are facilities owned and operated by companies whose primary business is not computing). The fourth category is by far the dominant one in terms of floor area and total electricity use, and almost all the issues raised in the article apply most clearly to facilities in that category.
Each category of data centers has very different characteristics and constraints. The scientific computing category is in a class by itself, because it runs computing jobs than can be queued up and these facilities thus do not need to respond to changes in demand. The other three categories must respond in real time, which requires some slack in the system in case of unanticipated changes in demand. That’s why quoting the 96.4% utilization of LBNL’s supercomputer in July 2012 (as the article does) says nothing about possibilities for increased utilization in the vast majority of data centers.
The public cloud providers are much more efficient than the “in-house” and collocation facilities. One implication of the NYT article (as expressed, for example, by quotes from Hank Seader and Randall Victora) is that we’ll be using the computing resources one way or another, and it doesn’t matter where these are housed. This conclusion is incorrect. The low utilization numbers cited in the NYT article generally apply to the “in-house” and collocation facilities, not to the cloud providers (who have many more and different kinds of users, so utilization is generally much higher). The infrastructure efficiencies in cloud computing facilities are higher as well. For example, the Power Utilization (or Usage) Effectiveness in typical “in-house” data centers is between 1.8 and 1.9, while for cloud facilities it is closer to 1.1 (that means for every 1 kWh used in IT equipment, only 0.1 kWh is used for cooling, fans, pumps, power distribution, and other infrastructure). So it really matters whether IT resources exist in cloud computing data centers or in standard “in-house facilities”, and the problems identified in the article mainly matter in the “in-house” facilities.
There are good reasons why cloud providers are more efficient, including economies of scale, diversity and aggregation of users, flexibility of operations, and ease of sidestepping organizational constraints. There is also an underlying driver for greater efficiency that is critically important–the cloud providers have fixed the internal institutional problems that lead to separate budgets for the IT and facilities departments (split incentives) and dispersed responsibility for data center design, construction, and operations. The vast majority of “in-house” and collocation facilities have not fixed these problems, so efficiency is not high (or not even) on the priority list. And it’s institutional and not technical failures (the lack of proper cost allocation, management responsibility, and inventory tracking) that results in a large number of “comatose” servers, for example.
The problem is that the people who run the data centers for “in-house” and collocated facilities have little influence on these institutional issues. It’s the people at the C-level in the corporation (CEO, CFO, CIO) who need to make these changes happen, and thus far there’s been little movement there in most companies. That’s the biggest challenge, and it’s one I wish the article had highlighted. Once these problems are fixed, big changes in efficiency follow rapidly and continue apace (they become part of the business culture and drive continuous improvements).
The article also ignores the value of the services being produced by data centers, which is the key reason why so many data centers have been built in the first place. The value is so much higher than the costs that the inefficiencies in the “in-house” facilities are tolerated as long as reliability is maintained.
The article and the associated “Room for Debate” section seem to imply that it is consumers’ and companies’ demand for instantly available information that is at fault for the industry’s obsession with “uptime”, but the demand for information can be met in many ways, and the issue is how the industry chooses to satisfy the demand for information, and not the nature of the demand for information itself. There are ways to deliver information with comparable levels of “uptime” but much lower costs and energy use (as the cloud computing providers have demonstrated), and we need to figure out ways for such innovations to be adopted in all “in-house” data centers.
Another (less important) issue I have with the article is that it uses the word “cloud” in its colloquial sense–i.e., anything on the other side of the users wall is “the cloud”. In this context, however, it is more important to distinguish “cloud computing” from the other types of data centers I list above, because cloud data centers are designed and operated quite differently from those other types. That’s the distinction that matters for understanding this issue, and the use of the colloquial term “cloud” just confuses people.
If you’ve already read the NYT article, I urge you to examine it again after reading this blog post. Distinguishing between different types of facilities should yield crucial insight into why these inefficiencies exist and what we can do about them. I’m interested to hear your thoughts.