Our newest work on comatose/zombie servers, out this week
One of the most surprising things about the data center industry is how cavalier it is about the number of servers sitting around using electricity but doing nothing. We call such servers “comatose”, or more colorfully, “zombies”.
In 2015 we did our first study of this issue using granular analysis on a small data sample (4000 servers) for a six month period in 2014, using data from TSO Logic. Now we’re back with a sample four times bigger, covering six months in 2015, and with additional detail on the characteristics of virtual machines.
My colleague at Anthesis, Jon Taylor (with whom I conducted the study) wrote up a nice summary of the work here. You can also download the study at that link.
Here are a few key paragraphs:
Two years on the data set from which the original findings were drawn has grown from 4,000 physical servers to more than 16,000 physical servers and additional information on 32,000 virtual machines (VM) running on hypervisors. The new findings show improvements, as well as an alarming wake-up call.
On the upside: when an enterprise acted to remove physical zombie servers when presented with evidence of the problem’s magnitude, they were able to reduce the amount from 30 percent to eight percent in just one year. On the downside: new data show that some 30 percent of VMs are zombies, demonstrating that the same discovery, measurement, and management challenges that apply to physical servers also apply to VMs.
The study confirms that the issue is still not being adequately addressed. New data indicates that one quarter to one third of data center investments are tied up with zombie servers, both physical and virtual. Virtualization without improved measurement technologies and altered institutional practices is not a panacea. Without visibility into the scale of these wasted resources the problem will continue to challenge the data center industry.
Here’s a key graph from the report:
There are some complexities in comparing the new data with the older data, because one facility in the 2014 sample decided not to allow its data to be used for the 2015 sample. The remaining facilities in the 2014 sample, when shown evidence that one third of their servers were comatose, took action and moved from more than 30% comatose to 8% comatose in just one year.
We corrected for these changes in an attempt to estimate the percent of comatose servers for enterprises that haven’t dealt with the problem, and the result is an estimate that about one quarter of servers in such companies are comatose (see the middle bar of the figure above).
Surprisingly, the percentage of virtual machines that were comatose was about 30% (see the right most bar above), indicating that the same management failures that lead to high percentages of comatose servers also afflict virtual machines. Virtualization without institutional changes is not a panacea!
One new issue raised in the latest report is important but often overlooked. Zombie servers are likely to not have been updated with the latest security patches, so they present a potent risk to the safety of the data center. Find them and remove them as soon as you can!