DCA REVIEW Resilience & Operational Best Practice
Is ‘resilience’ and ‘operational best practice’ possible at the same time?
By Ian Bitterlin, Chief Technical Officer, Emerson Network Power Systems.
WHEN I SAW the theme for this month’s DCA editorial it occurred to me that pursuing Best Practice in the data centre M&E infrastructure, particularly in the area of energy efficiency, could sometimes actually be in conflict with the pursuit of resilience, if that is defined as a form of reliability or availability.
We all know that in the pursuit of energy efficiency the low hanging fruit is still to be found in the cooling system and, clearly, there are some forms of best practice that are good for energy efficiency and don’t impact reliability, the most obvious being airflow management in the form of hot- aisle/cold-aisle layout, blanking plates and hole-stopping. These three actions are well proven to reduce cooling power by reducing bypass-air and it is clear that reliability of the ICT load is not affected.
However taking air management to the next level, aisle containment, can, under certain circumstances, lead to a requirement of ‘continuous’ cooling due to the volume of conditioned (cold) air being reduced to only that contained in the aisle and under the floor. Those circumstances are heightened when the cabinet load rises above 10kW where a momentary cooling failure will precipitate a rapid climb in server inlet temperature. If (and there are lots of those) the cabinet load is high, the set inlet-temperature is taken to the higher end of the ASHRAE 2011 ‘Recommended’ or, for the few intrepid early-adopters, even into the ‘Allowable’ range and the cooling system fails for a few minutes the temperature can rise quickly to the point where the server shuts down on ‘over-temperature’.
Clearly there is a link (under extreme conditions) between pursuing energy efficiency and service availability when applying close to 100% air separation without continuous cooling. In addition that ‘continuous cooling’ will have been achieved by adding redundancy, even UPS, and will have degraded the systems’ ability to operate at minimum efficiency.
Widening the thermal limits to those recommended in ASHRAE 2011 also has its risks, both real and perceived. Most people refer to the 2011 Guidelines just in terms of the table of temperature and humidity but the other 40 or so pages are well worth a close read. There you will find links between air- quality, elevated temperature and humidity that predict an increased failure rate of the servers themselves. The ‘perfect storm’ of fresh-air (direct economisation, especially with contaminants such as chlorides and sulphides), high temperature and high relative humidity results in accelerated PCB corrosion and the impact on server failure is
Another area for saving energy (admittedly a poor second to the cooling opportunities) is in the UPS system. Here ‘best practice’ (a common topic in both the EU CoC and
The Green Grid’s DCMM) would indicate that ‘eco- mode’ is the way to go
enumerated in the Guidelines. One solution (apart from avoiding fresh-air in the room) is to refresh the ICT hardware every 2-3 years but not all users can do that for CapEx reasons even though the server energy costs will drastically reduce at the same time.
Another area for saving energy (admittedly a poor second to the cooling opportunities) is in the UPS system. Here ‘best practice’ (a common topic in both the EU CoC and The Green Grid’s DCMM) would indicate that ‘eco-mode’ is the way to go. Here we
have an interesting view on the paranoia that most often regulates our data-centre world: Technically, with the more advanced forms of eco-mode operation (where the grid is used whenever its power quality is within the ITIC PQ Curve), there is no doubt that the technology works and every reason to enable eco-mode saving 3-5% of your ICT load energy.
When the grid deviates (and before that deviation reaches the load outside of the ITIC Curve) the UPS switches back to ‘normal’ on-line operation. Is there a ‘real’ risk? The answer is probably ‘yes, but minuscule’ to the point where it is impossible to measure the increase in failure rate over the typical 12-15 year life of the plant, however the ‘perceived’ risk (the paranoia) is higher. We are clearly at a point where the ever accelerating cost of energy is increasing the adoption-rate of eco-mode but there is still a reluctance to swap the energy saving benefits for a (perceived) degraded resilience.
The bottom line comes down to the pressure for energy reduction balanced against the need to provide the digital services at the Availability required by the business. No data centre was ever built to save its user energy – their prime purpose is to generate or protect revenues that enable profits to be realised and reputations maintained.
In the extremes we have the bank that loses customers because their ATMs keep displaying a ‘not available at this time’ and POS card-readers that decline purchases – they will not survive unless they invest in redundancy and high availability and energy efficiency is a secondary target. At the other end of the scale we have a social networking site whose costs are dominated by the cost of electricity and whose service can ride-through brief excursions that do not negatively impact their users and sponsoring advertisers. The very first question to ask a new data-centre developer should be ‘what is your appetite for risk?’ whilst the second may well be ‘what is your opportunity for maintenance shutdowns?’ rather than ‘what are your energy efficiency expectations?’
February 2014 I
www.dcseurope.info 15
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44 |
Page 45 |
Page 46 |
Page 47 |
Page 48 |
Page 49 |
Page 50 |
Page 51 |
Page 52