This page contains a Flash digital edition of a book.
storage ICT


milliseconds which is what you need to measure the I/O. A problem may not be captured, particularly if not looking at every I/O and in real- time. What we mean here is that the experience of an application is not measured in minutes or hours, or even in milliseconds (real-time) so the report logs are historic. It’s a common mistake to equate historical data which you get from polling and from performance data, where you are looking at line rate information and each I/O instead of real-time data. In addition, as most array vendors only keep 24 hours of data it may not be possible to identify the problem and see trends before they take place.


£ Switch issues – when we move to the second part of the stack to Fibre Channel switches, there are often issues with performance of the switch which have little to do with the vendor. Brocade and Cisco make great SAN switches. However just like the array, they are a device in the stack and they will only be as good as what they can see in their own product. Some believe that they can get all the performance information they need right out of their SAN switch. Well, unfortunately, that’s not the case. Let’s take a light hearted example. If I can see how busy the motorway is (throughput), I can’t necessarily see how long it’s going to take me to get home (latency). And what does my family care about? When I get home. Well I would argue that users running applications on storage infrastructure are looking at the same thing. Latency is what I’m after - throughput, while critical, not so much.


And it’s clear from the customer feedback that we get, that measuring throughput at the switch level doesn’t actually give a good indication of what the I/O experience is like.


£ Physical layer issues – bad connections can often result in re- issuing of commands which then leads to a flood of communications which will slow down databases and eradicate the benefits of flash storage. It doesn’t matter how much flash storage you buy, if your physical layer is not intact and healthy, you will not take advantage of the investment that you make.


£ Queue depth – is another interesting aspect that can cause real slow-downs and the big reason for this is that this is often set by the server team and not the storage team. Unfortunately, the larger your environment gets, the more difficult it is to manage this issue. One server manager may change queue depth to increase their


own performance and unfortunately this impacts other users (or servers) sharing that path. If a single lane highway is configured or connected to more highways inaccurately via the HBA tuning server administrator, it can also lead to up to 10x performance slow down.


£ Block size – the size of the read/write functionality needs to match the block size, if not tuned for the right block-size it can also result in performance issues. This again will be highly dependent on the type of application that is running. Clearly you can mitigate this issue through faster disk, but even so, there is a lot to be said about understanding this aspect.


£ CPU configuration – even in a virtualised environment, physical servers still have finite CPU capacity and what we find is that customers leverage a lot without thinking to add CPU and memory to the physical and virtual infrastructure.


More often than not the VMware administrator will allocate too much CPU or not enough, and in the event where there isn’t sufficient CPU this can impact applications no matter what flash drives you have in place! For flash to make a difference, there also needs to be enough servers and CPU.


As we see more and more vendors enter the market with flash arrays, clearly there will be a significant disruption. However when the infrastructure is not operating to expectations, it can be any manner of mis-configuration at fault and we estimate through our experience of end-to-end monitoring that 75 to 85% of all issues is not a result of storage array problems but of something else in the stack, and the more layers in the stack, and the more densely you virtualise, the worse the issue gets.


The way to pinpoint these issues is through a real-time whole IT infrastructure monitoring solution and by proactive performance management so that any mismatches can be identified before they become real glitches. This especially applies to the larger datacentres that are operating up to hundreds or even thousands of servers where identifying a problem can be like trying to find a needle in a haystack. Customers reliant on their IT infrastructures to support mission- critical activities are finding at their cost that before introducing new technologies, it’s wise to be in control and have a detailed view of the whole IT infrastructure.


Winter 2013 I www.dcseurope.info 17


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56  |  Page 57  |  Page 58  |  Page 59  |  Page 60  |  Page 61  |  Page 62  |  Page 63  |  Page 64  |  Page 65  |  Page 66  |  Page 67  |  Page 68  |  Page 69  |  Page 70  |  Page 71  |  Page 72