This page contains a Flash digital edition of a book.
FOCUS DOWNTIME


Issue 16, June/July


UPTIME ON DOWNTIME Human error has remained the number-one cause of data


The Uptime Institute if the industry will ever beat this trend


VP and global networks executive at the Uptime Institute, has analyzed this data to produce yearly reports. We caught up with him at the Uptime Institute Symposium in the Silicon Valley to discuss some of the most notable trends.


T Most


DCD FOCUS: Do you look at the cost of downtime?


RS: We don’t particularly look at what the cost is. That’s kind of a difficult subject. When you’re talking to a company, they generally won’t know what the cost of an outage is.


downtime incidents are


attributable to human error, but a recent survey by the Uptime Institute concluded that the number of humans working in data centers is shrinking


How would you rate your current staffing level?


Do you have 24/7 staffing presence at any of your data centers?


Overstaffed Adequate Understaffed


What is the significant constraint to your staffing needs?


Management approval


Security only No Yes


Qualifications Budget


Qualifications Budget


Management approval


Management approval


21% 18%


21% 10%


21% 18%


21% 18%


61% 61%


Qualifications Budget


Security only No Yes


Security only No Yes


Overstaffed Adequate Understaffed


67% 1% 1% 21% 10% 61% 69% 21% 32% 10%


67% 69%


69%


Overstaffed Adequate Understaffed


32% 67% 1% 32%


Of the failures that have occurred, about 73% are directly attributed to human error, and the other 27% are scattered among the other categories.


Can you provide us with a few examples?


One example is where a system was designed correctly, but not built properly, and therefore didn’t function as designed. Or where a system was installed as it was designed, but was not operated correctly, which led to a failure. In other words, it was placed in a higher capacity mode than it was designed for.


Typically, we see failures related to restoring the system once the maintenance activity has been done. Usually there’s a script used to take a piece of equipment or system out of service and there’s a script to put it back into service, and one of the steps was either incorrect or not followed properly. The system was put back in service… and tripped off because it was not restored properly.


18 www.datacenterdynamics.com


What are some of the most interesting conclusions drawn from this data?


The human error factor stayed at about three out of four for all these years. Given the [emphasis] that the Uptime Institute, 7x24, AFCOM, and others in presentations and conferences and symposia put on the human interface with computers, it’s interesting that that number stays the same. [This is] because the awareness is certainly there in many organizations: in all our member companies and many other organizations that are not part of the Uptime Institute Network.


But the record is fairly consistent. Year over year, 75% of incidents are attributable to human error. That’s pretty interesting to me. n


he Uptime Institute has a data repository about unplanned downtime incidents that dates back 18 years. Rick Schuknecht,


The cost of an outage can be viewed in a number of different areas. It could be loss of revenue. It could be loss of continuous transactional


throughput, which sometimes


equates to revenue loss and sometimes doesn’t. It could be loss of reputation in the industry. It could be a regulatory violation.


Different companies, depending on the business sector they’re in, have different ways of looking at the cost of an outage. When I was on the corporate side (not working for the Uptime Institute) with a major national bank, we had calculated that the cost of an outage [for us] was roughly equivalent to US$5m a minute.


Have you seen changes in the main causes of downtime over the past five years?


No. For the past five years the data has been really consistent: three-quarters, or almost three- quarters, has been attributable to human error.


Is this the main cause of downtime then?


The data tells us that about 10% of the total events reported are actual failures. The rest are near-misses, where something kept the event from cascading into a total failure.


Human error is the biggest cause of DC alarm


Are you talking about all types of systems? Yes. Power, cooling, control system, fire-life safety system.


Several of the errors reported to us were directly related to improper management of the fire-life safety system. A lot of the electrical and mechanical systems are linked to the fire system, so that if the fire system goes off it will automatically cause a shutdown. A lot of times we see that the fire-system maintenance activity being performed is not properly scripted, or the script is not followed, so it inadvertently shuts down the mechanical or electrical system.


Are other types of systems more prone to unplanned outages?


In 2010 there were 23 failures reported out of 305 events. Out of those 23 failures, 20 of them were electrical and three were mechanical. About 80% of those 20 [electrical-related incidents] were critical-power-distribution (downstream of the UPS) failures mostly caused by human error, and the others were in the UPS systems. The three that were on the mechanical side were all fire-life safety problems.


center downtime. Yevgeniy Sverdlik


asks


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56  |  Page 57  |  Page 58  |  Page 59  |  Page 60  |  Page 61  |  Page 62  |  Page 63  |  Page 64  |  Page 65  |  Page 66  |  Page 67  |  Page 68