Gregory Shiff, principal solutions architect, media & entertainment, Dell Technologies

We are in a new golden era of content creation. The explosion of streaming services has brought an unprecedented volume of new and amazing media. Production, post-production, visual effects, animation, finishing: everyone is booked solid with work. The expectations for this content are higher than ever, with new, technically challenging formats becoming the norm rather than the exception. Even in 2021, working with native 8K video or high frame rate 4K video (60 frame per second+) is no joke. During post, storage and workstation

performance can be huge bottlenecks. These bottlenecks can be particularly problematic for ‘hero’ seats that work with uncompressed media in real-time. Remote Direct Memory Access (RDMA), an ‘old’/’new’ (what do those words mean anymore?) technology improves storage and workstation performance simultaneously for systems handling the most demanding content. This article will examine using RDMA for NFS storage traffic over an Ethernet. Why NFS? Well, Linux is the operating system of choice for media professionals working with applications that support the most challenging media. Even if applications have Windows or macOS variants, the Linux version is used in the truly high-end. The native way for a Linux computer to access network storage is NFS and, in particular, NFS over TCP. This article is already going down a rabbit hole of

acronyms, so let us pause for a moment. I imagine that most people reading know about NFS (and SMB) and TCP (and UDP). For readers who are not familiar, NFS stands for Network File System. As said, NFS is how Linux systems talk to network storage (there are other ways, but mostly it is NFS). NFS traffic sits on top of other lower-level network protocols, in particular Transmission Control Protocol, (TCP) or User Datagram Protocol (UDP), but mostly it is TCP. TCP does a great job of handling things like packet loss on congested networks, but that comes with performance implications. Back now to RDMA, which is a protocol that allows for a client system to copy data from a storage server’s memory directly into that client’s memory. The client system bypasses many of the buffering layers inherent to TCP. This direct communication improves storage throughput and

reduces latency in moving data between server and client. It also reduces CPU load on the client and storage server.

RDMA was developed in the 1990s to support high-performance compute workloads running over InfiniBand networks. In the 2000s, two methods of running RDMA over Ethernet networks were developed, namely iWARP and RoCE. iWARP uses TCP for RDMA communications and RoCE uses UDP. There are various benefits and drawbacks to the two approaches. For instance, iWARP’s reliance on TCP offers greater flexibility in network design but suffers from many of the performance drawbacks of native TCP communications. RoCE reduces CPU overhead compared to iWARP but requires a lossless network. Ultimately, RoCE is the clear winner given that we are looking for the maximum storage performance with the lowest CPU load.

4.5Gbps. To cut a long story short, the image sequence would not play with the storage mounted over TCP. Mounting the exact same storage using RDMA was a night and day difference: 8K video at 24-frames per second over the network! Now let us look at workstation performance.

To be fair, uncompressed 8K video is unwieldy to store or work with. The number of facilities truly working in uncompressed 8K is small, and in fact 6K PIZ compressed OpenEXR is a more common format. OpenEXR is another image sequence format (file per frame) and PIZ compression is lossless, retaining full image fidelity. The PIZ compressed image sequence I used had frames between 80MB and 110MB each. Sustaining 24 frame-per-second required around 2.7Gbps.

“NFS over RDMA will play a

vital role for creative companies working with 8K or high frame-rate 4K video”

Put that all together and you can run NFS traffic over RDMA leveraging RoCE. If your client workstation, network and storage support NFSoRDMA, you can massively boost performance by mounting the network storage with a few mount options. The performance gains of RDMA are impressive. RDMA can be twice as performant as TCP all other things being equal (with a similar drop in workstation utilisation). Let us look at some real-world examples in media creation. First up, 8K uncompressed playback in DaVinci Resolve. Uncompressed video puts less strain on the workstation (no real-time decompression), but file sizes and bandwidth requirements are huge. In the testing for this article, an 8K DPX image sequence was put on the Dell EMC PowerScale network storage. As an image sequence, each frame of video is a separate file. At 8K resolution, each file is approximately 190MB. Sustaining 24 frame per second playback requires

This bandwidth is less than uncompressed 8K but still substantial. However, the real challenge is that the workstation needs to decompress each frame as it is being read. Frames were dropped in Resolve with the network storage mounted using TCP. The combination of CPU cycles required to read and decode each 6K frame using network storage was too much. RDMA was the key for this kind of playback. Remounting the storage using RDMA enabled smooth playback of this OpenEXR 6K PIZ image sequence over the network. Going a little deeper with workstation

performance, let us look at other common video formats: Sony XAVC and Apple ProRes 422HQ at full 4K DCI resolution and 59.94 frames per second, this time in Autodesk Flame. In debug mode Flame shows video disk, GPU and broadcast output dropped frames. With the file system mounted using TCP or RDMA the video disk never dropped a frame. The storage was plenty fast, as were the beefy Nvidia RTX GPUs. With the file system mounted using TCP, the broadcast output dropped thousands of frames; the workstation could not keep up. RDMA was a different story, with smooth broadcast output and essentially no dropped frames. In this case, it was all about the CPU cycles freed up by RDMA. That was a lot of information in one fairly short article, so let me put it plainly: NFS over RDMA will play a vital role for creative companies working with 8K or high frame-rate 4K video. If you want to dig deeper into my testing and results, please visit powerscale-onefs-nfs-over-rdma-for-media/

Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56  |  Page 57  |  Page 58  |  Page 59  |  Page 60  |  Page 61  |  Page 62  |  Page 63  |  Page 64  |  Page 65  |  Page 66  |  Page 67  |  Page 68  |  Page 69  |  Page 70  |  Page 71  |  Page 72