DCS Europe November 2013

storage ICT

cases, it is only after such confirmation that the root cause analysis can start elsewhere, in earnest. One of the important roles of I/O Management (IOM) is to help quickly identify whether performance issues originated in the storage network, or within the application server. Therefore IOM can be used to identify potential issues before they happen. The bottom line, however, is that whenever there is a performance problem, it is equally important to look at the application server I/O flows as looking at the SAN storage when identifying an I/O bottleneck.

I/O latency

I/O response time is one of the key indicators for the storage network performance. If I/O response time is increasing, it will eventually affect the application performance. There are many cases where issues in the system can drive I/O latency up, and I/O management can provide the ability to identify such issues.

Operating system command queues

Operating systems utilise command queues to control the flow of I/O through the operating system kernel. There are two important pieces of information about the command queue, relative to performance. The first piece of information is the rate that the command queue is being filled. The second is the rate at which the command queue is being emptied. These two pieces of information provide interesting insight when monitoring or debugging performance. While these mechanics are simple, the instrumentation to monitor the queues, detect issues and proactively alert system administrators is complex and typically non- existent in the IT infrastructure.

I/O utilisation profile Storage solutions have long monitored, reported on and alerted about areas such as link utilisation and storage capacity utilisation. What is often more interesting, particularly in dynamically managing workloads and troubleshooting performance issues, is I/O utilisation. While the various elements in a storage network are all specified to carry a certain bandwidth, there are many factors that come into play when determining how many I/O operations the elements can handle. Bottlenecks occur within the various I/O handling layers. This is where IOM adds critical insight into performance and availability management.

I/O protocol analysis

Many of today’s performance management solutions rely on industry-standard interfaces, including: CIM/WMI/SMI-S, SNMP and vendor-specific application programming

52 www.dcseurope.info I November 2013

interfaces (APIs). For the most part, all of these APIs are built on top of basic I/O counters. The data provided by these counters can include throughput data, I/O operations data and error counters. While there are evolving flavours for presentation of this data, the data is largely the same.

The missing component, the data that is most critical to understanding and resolving problematic performance issues, is related to what is happening at the protocol layers. This is why, in the most dire of situations, data centre engineers will turn to inserting protocol analyser hardware into the system in an attempt to understand what’s happening at the lowest levels of I/O exchanges. This is not a simple task.

A better approach involves determining the most helpful protocol events to capture, and capture all the time, without taking up too much application server CPU and memory resources, as a matter of course. This allows problem resolution, with key information from the protocol layers, to take place immediately, without having to wait for a problem to repeat itself. Capturing the interesting information, with human readable contextual information, also reduces the reliance on engineering experts to solve problems.

Putting it all together Collecting the right information is just the first part of effectively managing availability and performance in today’s complex environments. IOM also includes integrating that

information to present useful reports, including integration into higher-level

enterprise management solutions. These reports and alerts provide key benefits such as reduced capital costs and operating expenses by decreasing both scheduled and unscheduled downtime and optimising performance and utilisation of expensive

I/O infrastructure. Conclusion

Big Data, server virtualisation, I/O caching technologies and new scale-out architectures are impacting today’s data centre environments which are growing in both size and complexity. This creates challenges as the data centre administrators are struggling to fully utilise the expensive infrastructure while being highly responsive

in the areas of performance management and problem management. While storage resource management tools are adept at resolving performance issues related to the fabric links, and storage performance management tools are well-tailored to manage the array’s performance, it is often the I/O contention throughout the system that requires attention. I/O management tools that proactively monitor I/O from the application’s perspective allow the data centre to: £ Make better use of the existing infrastructure and avoid the purchasing of new infrastructure

£ Proactively manage system performance, making adjustments before there is a performance problem

£ Encourage true problem management, where root cause is established and the problem is resolved for good

These tools will only become more critical as increased layers of virtualisation are deployed over time, including server virtualisation, storage virtualisation and I/O virtualisation.

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52 | Page 53 | Page 54 | Page 55 | Page 56