DCS Europe May 2012

SNIA SSSI SSSI

Synthetic workload mapping

File System / Application level workloads will traverse the software/hardware stack and present a given IO stimulus to the Device Under Test. While these IOs may vary over time and by workload, general patterns have been observed that associate a given read/write ratio, block size, and access pattern with workload types. The table shows (on the previous page) general workload characteristics at the device level that have been observed to be associated with certain user application level workloads.

Characteristics of file-system level testing File system level testing differs from testing at the device level. File system level testing generally involves directly issuing specific file IO operations through the file system, targeted towards the DUT.

Because SSDs can be 1,000 to 20,000 time faster than conventional HDDs (SSDs can regularly reach IOPs of 60,000 compared to those of a conventional HDD, approximately 300 IOPs) and can have much smaller latencies (SSD latencies are usually measured in micro seconds compared to HDD latencies in milliseconds), the timing issues and effects of outstanding concurrent IOs are much more important in SSD performance measurement. Thus, the SSDs may be more susceptible to, and may have a higher frequency of file system effects which can alter the nature of the outstanding IOs due to caching, fragmentation, split IOs and the like as discussed below.

These variables and their effects on SSD performance are generally related to the following: £ The test application/test stimulus generator itself £ Various components/drivers that sits “between” the stimulus generator and the DUT

£ The interactions of these components within the OS software stack £ Characteristics of each file system £ The underlying computer hardware platform

As an example of how these interactions may affect the resultant IO applied to the DUT (that is, the transfer function between the IO generated and the IO “seen” at the DUT level is not unity), consider the above diagram. Some of the specific variables that can impact SSD performance testing at the File System Level as well as application IO performance in general are:

User Workloads

A primary interest for many, if not most, end users in comparing performance amongst SSDs is to determine and substantiate the performance benefits that can be gained while operating within their specific computing environments using their particular applications of interest. However, the range and diversity of applications that are available along with the particular manner in which they are actually used can introduce a significant set of factors that can impact application IO performance.

Fragmentation

As one example, a single file IO operation issued by an application may, on the one hand, require multiple IO operations to the physical device due to file fragmentation. The same IO could also result in no physical device access at all due to various caching strategies that may be implemented at the OS or driver level. Furthermore, the drivers can also split or coalesce IO commands, which can result in

16 www.dcseurope.info I May 2012

The advantages of synthetic, block level testing In contrast to user workload testing and the resultant, potentially wide variance between tests and the potential loss of 1:1 correspondence between the generated IO and the actual IO applied to the DUT, the use of Synthetic device level testing affords several advantages: £ The use of a known and repeatable stimulus, providing consistent

the loss of 1:1 correspondence between the originating IO operation and the physical device access.

Timing Various timing considerations can have a notable impact upon the manner in which IO operations traverse the OS software stack. For instance, while several applications can each be performing sequential access IO operations to their respective files, these concurrent IO operations can be observed to arrive in a more random access pattern at the lower-level disk drivers and other components (due to system task switching, intervening file system metadata I/O operations, etc.).

Concurrency All hosts process IOs with some degree of concurrency. However, the degree of concurrency

can be highly dependent on the host characteristics. For example, the number of processing cores and the ability to parallelize execution threads can each have an effect of how efficiently concurrent IOs are processed, affecting the (apparent) performance of the DUT.

Caching

System caches my intercept small IOs directed at the DUT, returning the requested data without ever actually accessing the DUT.

Coalescing

It is also possible that file system level tests can transparently coalesce smaller transfers, effectively modifying the stimulus from a smaller block, random transfer to a larger block, more sequential transfer.

In summary, file system level testing can dramatically, and potentially inconsistently, alter the nature of the IO. When executing file system level testing, one potentially loses the 1:1 correspondence between the IO generated and the IO applied to the DUT. It is both the alteration of the IO and its potentially inconsistent transfer function that may lead to imprecise results.

If the primary interest and goal of end users is to properly and prudently match the performance needs of their particular applications to their specific storage purchases, file system level testing my lead to incorrect conclusions. The natural propensity towards attempting to directly map (i.e., correlate) the advertised/ reported performance metrics of storage devices (e.g., IOPS, MB/s, etc.) to the presumed workload characteristics of their applications is extremely difficult because the IO activity that stems from applications is subject to a wide and inconsistent series of variables and effects as the IO operations traverse the OS software stack. This seemingly “natural” mapping is therefore very imprecise. This can easily be confirmed by collecting empirical IO operation performance metrics from the application perspective and comparing them at various key points within the OS software stack. Such a comparison of a sample IO traces will show the lack of a unity transfer function as the IO migrates from the application, to the DUT, and back.

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52