DCS Europe May 2012

storage ICT

Slow performance can be measured in large amounts of lost revenue, or as simply as every user accessing that application opening a ticket with support at the same time, and some applications support hundreds and thousands of users. I spoke recently with a European lottery organization that said a slowdown of their on-line system on a Saturday night could cost them one million Euros per minute in lost revenue. For them, poor response times are very damaging as the public are not prepared to wait for their impulse buy to confirm.

You can’t manage what you can’t see (Or...How “real time” is your “real time”) They say that in the land of the blind the one eyed man is king. . . Your mission-critical applications are the life blood of your business. They are tended to diligently to ensure they perform well across the internal network and public internet providing users with interrupted service. However, the infrastructure where they reside, the FC SAN, has lots of blind spots, and many customers treat their whole estate as though it were test and development when what they should be doing is treating the critical infrastructure differently.

Sadly, few understand that the virtualised server software can see what’s happening from the view of the server (CPU and Memory stats are great, but what about I/O?!), the switch fabric managers can see their own elements plus a few attachments(this is like counting the number of cars on the road(eg. Traffic congestion), but what people really care about is how long the car takes to get from home to work(this is latency)).

Lastly, storage management software can see what’s happening inside the arrays, but regrettably, if you don’t buy all from the same storage array vendor, none of them really offer the heterogeneous end-to-end real-time view on a single pane of glass that is essential for infrastructure performance optimization. So – you are managing the critical application infrastructure with multiple software tools and stitching the results together, hoping and praying that nothing will go wrong with the infrastructure you can’t see.

SAN Blindness is the real infrastructure money waster. Stop throwing your money away and stop guessing what performance output you’re going to get from you infrastructure!

Most IT managers know they have parts of the infrastructure they can’t see. In order to get around the ‘fire and forget’ policy where data enters the SAN and comes out the other end without any knowledge of what happened to it during its journey, they massively overprovision all the elements to ensure resilience. This is not a failure of their behalf it’s just that until recently having a VM to LUN view was not possible. This overprovisioning (E.G. typical utilisation level of an Enterprise switch port is less than 10%) is unnecessarily causing massive over spending in SAN switching capacity

A recent Gartner report stated that typical storage array capacity utilization rates are 67.3% of configured usable capacity. The vendors of the SAN elements are certainly in no hurry to change your policy, but I would argue - they should be! Improving utilization levels can save vast amounts of budget and the way to do that is to be able to proactively monitor all elements of the SAN in real time. Large enterprise data centres require vendor-independent real-time monitoring.

Bang . . .

So what happens when something does go wrong in the SAN, even a minor issue? How many people got involved last time Exchange

slowed down? You don’t have to have an “outage” to feel the pain. Typically, it results in lots of finger pointing and confusion. The storage department is usually regarded as guilty until proven innocent by everybody else including IT staff and vendors.

Finding the root cause of latency problems, or worse still, an outage can be a long drawn out task. Most monitoring tools average their results over time, typically from 5 to 30 minutes, so can’t see the ‘spikes’ of latency that cause problems...remember, application performance is measured often in “millisecond response” and if you’re only looking at this every 5 minutes you are missing 5000 opportunities to find a fault that you missed! A five minute average result could easily show that all is running normally for an intermittent fault. Switches don’t like to be polled frequently as it affects their performance, the more you pole the slower they react.

TAP all SAN ports

How do you ensure application performance and availability without impacting its performance? Although Traffic Access Ports (TAPs) are well known in the IP space, TAPs in the fibre channel space are growing like mad... their use is becoming data centre best practice in a refresh and suppliers are often running into shortages! Fibre channel TAPs are the solution together with dedicated hardware and software probes to analyse the data flow.

The TAP is an optical splitter, either as an in-line device or embedded in a patch panel, that divides a portion of the light from the fibre channel signal so it can be analysed off line (don’t worry it’s all sanctioned by SNIA and FCA). When you use a TAP, and you can read the protocol . . . any OS, any storage type, and any switch can be seen! TAPs enable out-of-band analysis of the frame headers, which tells you where the data is going and where it came from – and how long it’s taking to get there as well as a bevy of information on transmission problems.

Because if you are looking at the Fibre Channel traffic in a real-time dashboard, it doesn’t matter what vendor equipment you have – it can see it all from the virtualised server right down into the LUNs on the storage array. It provides you with the tools you need to optimize performance, availability and overall data centre infrastructure and staff costs.

May 2012 I www.dcseurope.info 19

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52